seniorSupervised Learning
What is sampling bias in supervised learning datasets?
Updated May 17, 2026
Short answer
Sampling bias occurs when training data is not representative of the real-world distribution.
Deep explanation
Sampling bias happens when data collection overrepresents or underrepresents certain groups. This leads to models that perform well on training data but poorly in real-world scenarios. It is especially critical in fairness-sensitive domains like hiring or lending. Proper sampling strategies and data balancing help mitigate it.
Real-world example
A hiring model trained mostly on male candidates failing to generalize to female candidates.
Common mistakes
- Assuming more data automatically removes bias.
Follow-up questions
- How is sampling bias different from class imbalance?
- How do you fix sampling bias?