midData Mining
What is entropy in data mining and why is it important?
Updated May 15, 2026
Short answer
Entropy measures the uncertainty or impurity in a dataset.
Deep explanation
Entropy comes from information theory and quantifies unpredictability in data. In classification problems, it measures how mixed class labels are. Higher entropy means more disorder. It is foundational in decision tree algorithms like ID3 and C4.5, where splits are chosen to reduce entropy and increase information gain. A perfectly pure dataset has zero entropy.
Real-world example
Used in decision trees to determine the best feature split in customer churn prediction.
Common mistakes
- Confusing entropy with variance or treating it as a distance metric.
Follow-up questions
- What is information gain?
- Why is entropy used in trees?