Simple theory: Logistic regression is a classification model. It computes a linear score from features, converts that score into a probability, and then uses a threshold to choose a class.
Despite its name, Logistic Regression is a classification algorithm. It doesn't predict a number — it predicts a probability: 'What's the chance this email is spam?' or 'How likely is this tumour malignant?' The output is always between 0 and 1, interpreted as confidence. Then you draw a line: above 0.5 = class A, below 0.5 = class B.
The sigmoid function
Linear regression outputs any number from −∞ to +∞. That's useless for probability. The sigmoid function squashes any number into the range (0, 1).
σ(z) = 1 / (1 + e⁻ᶻ)
z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b (linear combination of features)
σ(−5) ≈ 0.007 → very likely class 0
σ(0) = 0.5 → decision boundary (maximum uncertainty)
σ(+5) ≈ 0.993 → very likely class 1The sigmoid output is your model's probability estimate for class 1
Training with cross-entropy loss
Cross-Entropy Loss = −[ y·log(ŷ) + (1−y)·log(1−ŷ) ]
y = true label (0 or 1)
ŷ = model's predicted probability
If y=1 and ŷ→0: loss → ∞ (confidently wrong = huge penalty)
If y=1 and ŷ→1: loss → 0 (confidently right = no penalty)Cross-entropy heavily punishes confident wrong predictions — exactly what we want