Gradient Descent

Machine LearningAlso called: "SGD", "stochastic gradient descent"

Gradient descent is the engine of machine learning. Imagine the network's error as a hilly landscape where altitude = how wrong the predictions are. Gradient descent finds the bottom of a valley by always stepping in the steepest downhill direction.

Key concepts

Gradient: the direction of steepest increase in error (computed by backpropagation). Gradient descent steps in the opposite direction.
Learning rate: how big each step is. Too large and you overshoot the valley floor; too small and training takes forever. This is the most important hyperparameter in deep learning.
Stochastic gradient descent (SGD): instead of computing the gradient over all data (slow), compute it over small random batches. Noisier but much faster.

Modern variants

Adam — adapts the learning rate for each weight individually; the default choice in most deep learning today.
Momentum — accumulates velocity in consistent directions to power through flat regions.

See it in action In our Neural Network Playground you can switch between SGD and Adam, change the learning rate, and watch how the loss curve responds — including what happens when the learning rate is too high.