A derivative tells you the slope in one direction. A gradient tells you the slope in every direction at once — and, crucially, which direction is steepest uphill. In machine learning, the gradient of the loss function is the compass that tells every weight which way to move.
Partial derivatives
For a function of two variables f(x, y), the partial derivative ∂f/∂x is the slope when you move only in the x-direction (holding y fixed). ∂f/∂y is the slope in the y-direction alone. The gradient combines them into one arrow pointing directly uphill.
∇f(x, y) = [ ∂f/∂x, ∂f/∂y ]
For f(x,y) = x² + y²:
∂f/∂x = 2x
∂f/∂y = 2y
∇f = [2x, 2y]The gradient is a vector of all partial derivatives
Gradient in a neural network
A network with millions of weights has a loss function of millions of variables. The gradient is a vector of millions of partial derivatives — one per weight. Backpropagation computes all of them in one efficient pass using the chain rule, then each weight steps in the direction of −∇f.
Weight update rule:
w ← w − lr × ∂Loss/∂w
Applied to every weight simultaneously each iteration.lr = learning rate (step size). Smaller lr = more cautious steps.