juniorCost Function
What is stochastic gradient descent?
Updated May 15, 2026
Short answer
SGD updates model parameters using one sample at a time.
Deep explanation
It approximates full gradient descent using single or small batches, making training faster and scalable.
Real-world example
Used in training large-scale deep learning models.
Common mistakes
- Assuming it always converges smoothly.
Follow-up questions
- SGD vs batch gradient descent?
- Why use mini-batches?