midScikit-Learn
What is data leakage in ML?
Updated May 17, 2026
Short answer
Data leakage occurs when training data contains information from test data.
Deep explanation
It leads to overly optimistic performance and invalid models.
Real-world example
Using future stock prices in training model.
Common mistakes
- Fitting preprocessing on entire dataset.
Follow-up questions
- How to prevent leakage?
- What is target leakage?