Introducing PrunedTree: Smarter Decision Trees with Automatic Depth Pruning

Arun SundarArun Sundar
2 min read

Motivation:

Decision Trees are a popular choice in machine learning due to their interpretability and ease of use. However, they often suffer from a major issue: overfitting.
Most practitioners/Data scientists manually tune the max_depth hyperparameter — either via trial and error or grid search.

That’s where I thought, “Why not automate this step?”

So, I built and published prunetree, a Python package that provides:

A drop-in DecisionTreeClassifier with built-in automatic pruning based on validation accuracy.

What Is prunetree?

PrunedDecisionTreeClassifier is a scikit-learn compatible estimator that:

  • Iteratively trains trees from depth 1 up to a max limit

  • Evaluates each on validation data

  • Selects the depth with the highest validation accuracy

  • Fits the final model using that depth

All you have to do is:

“pip install prunetree”


How to Use It:

Here’s how you can use it with your training and testing split:

from prunetree import PrunedDecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

clf = PrunedDecisionTreeClassifier(
prune=True,
validation_data=(X_test, y_test),
random_state=42
)

clf.fit(X_train, y_train)

print(“Best depth selected:”, clf.best_depth)
print(“Test accuracy:”, clf.score(X_test, y_test))


Image depiction of the node and accuracy score pruning:

Generated image

When to Use It

Use PrunedDecisionTreeClassifier when:

  • You want a clean, fast Decision Tree without manually tuning max_depth

  • You have a fixed validation/test split

  • You prefer simplicity over full-blown grid search

Think of it as a decision tree that tunes itself (at least for depth!)


Why I Open Sourced It?

I built this as a personal project to learn Python packaging and real-world ML tooling. But then I realized others might find it useful too.

So, I made it open source:


What’s Next?

I’m planning to:

  • Add cross-validation based pruning

  • Add support for regression trees

  • Integrate into pipelines and grid search


Final Words:

If you’re someone who loves clean machine learning workflows or just hates tuning max_depth manually -> give prunetree a try.

Happy learning and keep building.
– Arun Sundar K

1
Subscribe to my newsletter

Read articles from Arun Sundar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Arun Sundar
Arun Sundar