Math for ML - Intermediate - 8 min

Learn Entropy & Information

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

Information theory answers a deceptively simple question: how much do you learn when something happens? Surprising events carry more information than predictable ones. If someone tells you 'the sun rose this morning', you learned nothing — you already knew it. If they say 'it snowed in July', you learned a lot. Entropy quantifies this precisely.

The entropy formula

H(X) = −Σ p(x) · log₂ p(x). For each possible outcome: multiply its probability by the log of its probability, sum them up, and negate. The result is measured in bits. For a fair coin: H = −(0.5 · log₂ 0.5) − (0.5 · log₂ 0.5) = −(0.5 · −1) − (0.5 · −1) = 1 bit.

Cross-entropy loss in ML

When a classifier outputs probabilities [0.7, 0.2, 0.1] but the true label is class 2, cross-entropy loss measures how different those two distributions are. H(true, predicted) = −Σ true · log(predicted). A perfect prediction makes cross-entropy = true entropy (minimum possible). An overconfident wrong prediction makes it huge. This is why cross-entropy is the default loss for every classification model.

Practice questions

  1. Which distribution has the maximum possible entropy?
  2. What is the entropy of a certain event (probability = 1.0)?
  3. Why is cross-entropy used as the loss function for classification models?
  4. A biased coin lands heads 90% of the time. Compared to a fair coin, its entropy is:

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Math for ML lessons