What is L1 and L2 regularization in Gradient Descent?

Updated May 16, 2026

Short answer

L1 adds absolute penalty; L2 adds squared penalty to loss function.

Deep explanation

L1 regularization encourages sparsity, while L2 discourages large weights. Both modify Gradient Descent updates by adding penalty gradients.

Real-world example

Feature selection in high-dimensional ML problems.

Common mistakes

  • Confusing L1 sparsity effect with L2 smoothing effect.

Follow-up questions

  • Which is better for feature selection?
  • What is elastic net?

More Gradient Descent interview questions

View all →