Simple theory: Gradient descent is the update rule models use to reduce error. It looks at the slope of the loss and takes small steps downhill until the loss becomes lower.
You're blindfolded on a hilly mountain, and you want to reach the lowest valley. What do you do? You feel the slope under your feet and take a step in the downhill direction. Repeat. That's Gradient Descent.
The Learning Rate matters a lot
If the learning rate is too high: the ball bounces back and forth, never settling. If too low: it crawls painfully slowly. The sweet spot is chosen carefully — this is a major skill in deep learning.