Learn Gradients

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

A derivative tells you the slope in one direction. A gradient tells you the slope in every direction at once — and, crucially, which direction is steepest uphill. In machine learning, the gradient of the loss function is the compass that tells every weight which way to move.

Partial derivatives

For a function of two variables f(x, y), the partial derivative ∂f/∂x is the slope when you move only in the x-direction (holding y fixed). ∂f/∂y is the slope in the y-direction alone. The gradient combines them into one arrow pointing directly uphill.

∇f(x, y) = [ ∂f/∂x,  ∂f/∂y ]

For f(x,y) = x² + y²:
  ∂f/∂x = 2x
  ∂f/∂y = 2y
  ∇f     = [2x, 2y]

The gradient is a vector of all partial derivatives

Gradient in a neural network

A network with millions of weights has a loss function of millions of variables. The gradient is a vector of millions of partial derivatives — one per weight. Backpropagation computes all of them in one efficient pass using the chain rule, then each weight steps in the direction of −∇f.

Weight update rule:
  w ← w − lr × ∂Loss/∂w

Applied to every weight simultaneously each iteration.

lr = learning rate (step size). Smaller lr = more cautious steps.

Partial derivatives

Gradient in a neural network

Practice questions

Related AI learning resources