seniorSupervised Learning
What is variance in machine learning and why does it cause overfitting?
Updated May 17, 2026
Short answer
Variance is the sensitivity of a model to small fluctuations in training data.
Deep explanation
Variance measures how much a model’s predictions change when trained on different subsets of data. High variance models learn noise along with patterns, leading to overfitting. Complex models like deep decision trees or high-degree polynomial regression typically have high variance. Reducing variance often involves regularization, pruning, or using ensembles.
Real-world example
A deep decision tree memorizing training customer behavior but failing on new customers.
Common mistakes
- Thinking variance only relates to dataset size.
Follow-up questions
- How do ensembles reduce variance?
- What is the effect of pruning?