What is L1 and L2 regularization in Gradient Descent?
Updated May 16, 2026
Short answer
L1 adds absolute penalty; L2 adds squared penalty to loss function.
Deep explanation
L1 regularization encourages sparsity, while L2 discourages large weights. Both modify Gradient Descent updates by adding penalty gradients.
Real-world example
Feature selection in high-dimensional ML problems.
Common mistakes
- Confusing L1 sparsity effect with L2 smoothing effect.
Follow-up questions
- Which is better for feature selection?
- What is elastic net?