Math for ML - Intermediate - 12 min

Learn Gradients

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

A derivative tells you the slope in one direction. A gradient tells you the slope in every direction at once — and, crucially, which direction is steepest uphill. In machine learning, the gradient of the loss function is the compass that tells every weight which way to move.

Partial derivatives

For a function of two variables f(x, y), the partial derivative ∂f/∂x is the slope when you move only in the x-direction (holding y fixed). ∂f/∂y is the slope in the y-direction alone. The gradient combines them into one arrow pointing directly uphill.

∇f(x, y) = [ ∂f/∂x,  ∂f/∂y ]

For f(x,y) = x² + y²:
  ∂f/∂x = 2x
  ∂f/∂y = 2y
  ∇f     = [2x, 2y]

The gradient is a vector of all partial derivatives

Gradient in a neural network

A network with millions of weights has a loss function of millions of variables. The gradient is a vector of millions of partial derivatives — one per weight. Backpropagation computes all of them in one efficient pass using the chain rule, then each weight steps in the direction of −∇f.

Weight update rule:
  w ← w − lr × ∂Loss/∂w

Applied to every weight simultaneously each iteration.

lr = learning rate (step size). Smaller lr = more cautious steps.

Practice questions

  1. What does the gradient vector point toward?
  2. For f(x,y) = x² + y², what is the gradient ∇f at point (2, 3)?
  3. Why do we use the negative gradient in gradient descent?
  4. What does backpropagation compute in a neural network?

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Math for ML lessons