What is sampling bias in supervised learning datasets?

Updated May 17, 2026

Short answer

Sampling bias occurs when training data is not representative of the real-world distribution.

Deep explanation

Sampling bias happens when data collection overrepresents or underrepresents certain groups. This leads to models that perform well on training data but poorly in real-world scenarios. It is especially critical in fairness-sensitive domains like hiring or lending. Proper sampling strategies and data balancing help mitigate it.

Real-world example

A hiring model trained mostly on male candidates failing to generalize to female candidates.

Common mistakes

  • Assuming more data automatically removes bias.

Follow-up questions

  • How is sampling bias different from class imbalance?
  • How do you fix sampling bias?

More Supervised Learning interview questions

View all →