Classical ML - Intermediate - 12 min

Learn Logistic Regression

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

Simple theory: Logistic regression is a classification model. It computes a linear score from features, converts that score into a probability, and then uses a threshold to choose a class.

Despite its name, Logistic Regression is a classification algorithm. It doesn't predict a number — it predicts a probability: 'What's the chance this email is spam?' or 'How likely is this tumour malignant?' The output is always between 0 and 1, interpreted as confidence. Then you draw a line: above 0.5 = class A, below 0.5 = class B.

The sigmoid function

Linear regression outputs any number from −∞ to +∞. That's useless for probability. The sigmoid function squashes any number into the range (0, 1).

σ(z) = 1 / (1 + e⁻ᶻ)

z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b   (linear combination of features)

σ(−5)  ≈ 0.007  →  very likely class 0
σ(0)   = 0.5    →  decision boundary (maximum uncertainty)
σ(+5)  ≈ 0.993  →  very likely class 1

The sigmoid output is your model's probability estimate for class 1

Training with cross-entropy loss

Cross-Entropy Loss = −[ y·log(ŷ) + (1−y)·log(1−ŷ) ]

y   = true label (0 or 1)
ŷ   = model's predicted probability

If y=1 and ŷ→0:  loss → ∞  (confidently wrong = huge penalty)
If y=1 and ŷ→1:  loss → 0   (confidently right = no penalty)

Cross-entropy heavily punishes confident wrong predictions — exactly what we want

Practice questions

  1. What does the sigmoid function output?
  2. A logistic regression model outputs 0.73 for a patient's tumour. What does this mean?
  3. Why is cross-entropy loss used instead of MSE for logistic regression?
  4. The decision boundary of a logistic regression model in 2D feature space is:

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Classical ML lessons