A distribution tells you where your data likes to live. Most people are close to average height — very few are 7 feet tall. That clustering around the middle is the Normal distribution, the most important shape in all of statistics and machine learning.
The Normal (Gaussian) distribution
Defined by two numbers: μ (mu) — the mean, or centre of the bell — and σ (sigma) — the standard deviation, which controls how wide or narrow the bell is. Small σ = tall narrow peak. Large σ = short wide spread.
Why distributions matter in ML
Neural network weights are initialised from a Normal distribution. Data preprocessing (standardisation) transforms your features to have μ=0, σ=1. Batch Normalisation forces activations to stay normally distributed during training. The Normal distribution is everywhere.