seniorSupervised Learning
What is data leakage in supervised learning?
Updated May 17, 2026
Short answer
Data leakage occurs when information from outside the training dataset influences the model.
Deep explanation
Data leakage happens when training data contains information that would not be available during prediction. This leads to overly optimistic performance and poor generalization. It can occur through improper preprocessing, target leakage, or time-series misalignment.
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro