What is the role of training, validation, and test sets?
Updated May 17, 2026
Short answer
Training data fits the model, validation data tunes it, and test data evaluates final performance.
Deep explanation
The training set is used to learn model parameters. The validation set is used for hyperparameter tuning and model selection. The test set is used only once to estimate real-world performance. Proper separation prevents data leakage and ensures unbiased evaluation.
Real-world example
In medical diagnosis models, validation helps tune thresholds while test data ensures safe deployment performance.
Common mistakes
- Using test data repeatedly for tuning hyperparameters.
Follow-up questions
- What is cross-validation?
- Why is data leakage dangerous?