Math for ML - Beginner - 10 min

Learn Distributions

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

A distribution tells you where your data likes to live. Most people are close to average height — very few are 7 feet tall. That clustering around the middle is the Normal distribution, the most important shape in all of statistics and machine learning.

The Normal (Gaussian) distribution

Defined by two numbers: μ (mu) — the mean, or centre of the bell — and σ (sigma) — the standard deviation, which controls how wide or narrow the bell is. Small σ = tall narrow peak. Large σ = short wide spread.

Why distributions matter in ML

Neural network weights are initialised from a Normal distribution. Data preprocessing (standardisation) transforms your features to have μ=0, σ=1. Batch Normalisation forces activations to stay normally distributed during training. The Normal distribution is everywhere.

Practice questions

  1. What does standard deviation (σ) control in a Normal distribution?
  2. Approximately what % of data falls within 2 standard deviations of the mean?
  3. Why do we standardise features to have mean=0 and std=1 before training?
  4. How are neural network weights typically initialised?

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Math for ML lessons