The dot product answers one question: how similar are two vectors? Two arrows pointing in the same direction have a high dot product. Two arrows pointing opposite ways have a negative one. Two arrows at 90° have zero. This single number drives recommendation systems, search engines, and attention in transformers.
The formula
Dot product of [a₁, a₂, a₃] and [b₁, b₂, b₃] = a₁b₁ + a₂b₂ + a₃b₃. Multiply element-by-element, then sum. Geometrically, it equals |A| × |B| × cos(θ) — where θ is the angle between the two vectors.
Where it shows up in ML
Every dot product in a neural network is computing similarity — how much does this input match this pattern? The attention mechanism in transformers is literally dot products between query and key vectors to find which words are most relevant to each other.