Learn Chain Rule

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

The chain rule is the single mathematical idea that makes neural networks trainable. It answers: how does changing the very first weight in a deep network affect the final loss — when there are 50 layers of functions in between? The answer: multiply the local derivatives all the way through. That's backpropagation.

The formula

Two functions:   d/dx f(g(x))       = f′(g(x)) · g′(x)
Three functions: d/dx f(g(h(x)))   = f′(g(h(x))) · g′(h(x)) · h′(x)

Rule: multiply the local derivatives of each function in the chain.

Each factor is 'how sensitive is this layer's output to its input?'

Vanishing & exploding gradients

If each local derivative is slightly less than 1, multiplying 100 of them gives a number near zero — the gradient vanishes. If each is slightly greater than 1, the product explodes. This is why deep network training was hard before ReLU, residual connections, and careful initialisation — they all help keep the chain product from collapsing.

The formula

Vanishing & exploding gradients

Practice questions

Related AI learning resources