What is min_samples_split in Decision Trees?

Updated May 16, 2026

Short answer

It defines the minimum number of samples required to split an internal node.

Deep explanation

min_samples_split prevents the tree from creating splits that are statistically unreliable. If a node has fewer samples than this threshold, it becomes a leaf node. This reduces overfitting by avoiding splits based on very small datasets that may capture noise rather than patterns.

Real-world example

In marketing analytics, it prevents creating customer segments based on too few users.

Common mistakes

  • Setting it too low, causing overfitting.

Follow-up questions

  • How is it different from min_samples_leaf?
  • What happens if set too high?
  • Should it be tuned with max_depth?

More Decision Trees interview questions

View all →