Probability is the language of uncertainty — and machine learning is all about making good guesses under uncertainty. Every prediction a model makes is really a probability: 'I'm 94% sure this is a cat, 6% sure it's a dog.'
Sample space & events
The sample space is all possible outcomes. For a coin: {Heads, Tails}. For a die: {1,2,3,4,5,6}. An event is any subset of outcomes you care about. P(rolling an even number) = 3/6 = 0.5, because 3 of the 6 outcomes are even.
The Law of Large Numbers
Flip a coin 10 times and you might get 7 heads. Flip it 10,000 times and you'll get almost exactly 5,000 heads. The more trials you run, the closer your observed frequency gets to the true probability. This is why ML models need large datasets.
Why probability powers ML
Classification models output probabilities. Loss functions measure probability errors. Bayesian models update probabilities as they see new data. Even dropout — randomly switching off neurons during training — is a probabilistic trick. Understanding probability is non-negotiable for ML.