What is the role of training, validation, and test sets?

Updated May 17, 2026

Short answer

Training data fits the model, validation data tunes it, and test data evaluates final performance.

Deep explanation

The training set is used to learn model parameters. The validation set is used for hyperparameter tuning and model selection. The test set is used only once to estimate real-world performance. Proper separation prevents data leakage and ensures unbiased evaluation.

Real-world example

In medical diagnosis models, validation helps tune thresholds while test data ensures safe deployment performance.

Common mistakes

  • Using test data repeatedly for tuning hyperparameters.

Follow-up questions

  • What is cross-validation?
  • Why is data leakage dangerous?

More Supervised Learning interview questions

View all →