juniorMLOps

What is data leakage in ML pipelines?

Updated May 17, 2026

Short answer

Data leakage occurs when training data contains information from the future or target variable.

Deep explanation

Leakage leads to overly optimistic model performance. It often happens due to improper feature engineering or train-test contamination. In MLOps, strict data separation and pipeline isolation are required.

Real-world example

Including future sales data in a demand forecasting model during training.

Common mistakes

  • Mixing preprocessing steps before splitting data.

Follow-up questions

  • How do you detect leakage?
  • How to prevent leakage?

More MLOps interview questions

View all →