What is the difference between empirical risk and expected risk in supervised learning?

Updated May 17, 2026

Short answer

Empirical risk is loss on training data, while expected risk is loss over the true data distribution.

Deep explanation

Empirical risk is computed using the available dataset and is what models are directly optimized during training. Expected risk (true risk) is the average loss over the entire unknown data distribution. Since the true distribution is unknown, supervised learning aims to minimize empirical risk while controlling generalization error so that it approximates expected risk closely.

Real-world example

A recommendation model performing well on historical data but failing on new user behavior patterns.

Common mistakes

  • Assuming training error equals real-world performance.

Follow-up questions

  • Why can't expected risk be computed directly?
  • What bridges empirical and expected risk?

More Supervised Learning interview questions

View all →