What is feature scaling impact on gradient descent?
Updated May 16, 2026
Short answer
Feature scaling improves gradient descent convergence by ensuring uniform feature contribution.
Deep explanation
Without scaling, gradient descent oscillates or converges slowly because large-scale features dominate updates. Scaling ensures smooth and stable optimization paths.
Real-world example
Used in training neural networks and logistic regression models.
Common mistakes
- Training models without scaling numerical inputs.
Follow-up questions
- Why does gradient descent fail without scaling?
- Does SGD require scaling?