What is feature scaling impact on gradient descent?

Updated May 16, 2026

Short answer

Feature scaling improves gradient descent convergence by ensuring uniform feature contribution.

Deep explanation

Without scaling, gradient descent oscillates or converges slowly because large-scale features dominate updates. Scaling ensures smooth and stable optimization paths.

Real-world example

Used in training neural networks and logistic regression models.

Common mistakes

  • Training models without scaling numerical inputs.

Follow-up questions

  • Why does gradient descent fail without scaling?
  • Does SGD require scaling?

More Feature Engineering interview questions

View all →