juniorMLOps
What is data leakage in ML pipelines?
Updated May 17, 2026
Short answer
Data leakage occurs when training data contains information from the future or target variable.
Deep explanation
Leakage leads to overly optimistic model performance. It often happens due to improper feature engineering or train-test contamination. In MLOps, strict data separation and pipeline isolation are required.
Real-world example
Including future sales data in a demand forecasting model during training.
Common mistakes
- Mixing preprocessing steps before splitting data.
Follow-up questions
- How do you detect leakage?
- How to prevent leakage?