seniorModel Evaluation
What is evaluation contamination in LLM benchmarks?
Updated May 17, 2026
Short answer
It occurs when evaluation data appears in training corpora of LLMs.
Deep explanation
LLMs trained on large web-scale datasets may inadvertently include benchmark datasets, leading to memorization rather than true generalization. This undermines benchmark validity and inflates reported performance.
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro