What is data leakage in ML?

Updated May 17, 2026

Short answer

Data leakage occurs when training data contains information from test data.

Deep explanation

It leads to overly optimistic performance and invalid models.

Real-world example

Using future stock prices in training model.

Common mistakes

  • Fitting preprocessing on entire dataset.

Follow-up questions

  • How to prevent leakage?
  • What is target leakage?

More Scikit-Learn interview questions

View all →