Gradient descent is the engine of machine learning. Imagine the network's error as a hilly landscape where altitude = how wrong the predictions are. Gradient descent finds the bottom of a valley by always stepping in the steepest downhill direction.
Key concepts
- Gradient: the direction of steepest increase in error (computed by backpropagation). Gradient descent steps in the opposite direction.
- Learning rate: how big each step is. Too large and you overshoot the valley floor; too small and training takes forever. This is the most important hyperparameter in deep learning.
- Stochastic gradient descent (SGD): instead of computing the gradient over all data (slow), compute it over small random batches. Noisier but much faster.
Modern variants
- Adam — adapts the learning rate for each weight individually; the default choice in most deep learning today.
- Momentum — accumulates velocity in consistent directions to power through flat regions.
See it in action In our Neural Network Playground you can switch between SGD and Adam, change the learning rate, and watch how the loss curve responds — including what happens when the learning rate is too high.
Related Tools
Related Articles
View all articlesExplore More Machine Learning
View all termsBackpropagation
The algorithm that calculates how much each weight in a neural network contributed to a prediction error, so every weight can be corrected.
Read more →Deep Learning
Machine learning using neural networks with multiple hidden layers, allowing models to learn increasingly abstract patterns from raw data.
Read more →Neural Network
A machine learning model made of layers of simple computing units (neurons) whose connection strengths are tuned automatically from example data.
Read more →