Most machine learning models feel like guesswork in disguise. They look at enough data, do some math, and start making predictions. But if you ask why a model made a decision, you’re often met with silence.

Decision Trees are different.

They don’t guess. They investigate.

A decision tree is a supervised learning model - one that makes decisions by asking a series of structured, yes/no questions.

It approaches the problem like a detective working a case. It doesn’t try to solve everything at once - it breaks things down, one question at a time.

“Was the suspect seen on camera?”
“Did they have a motive?”
“Do they match the eyewitness description?”

Each answer filters the pool of suspects. Each split makes the case a little clearer. With every step, the tree moves closer to a single, confident outcome.

It's not guessing. It's ruling things out - methodically, visibly, and on record.

How a Decision Tree Actually Makes Decisions

The investigation begins with a dataset of examples with features (like location, motive, evidence) and known outcomes (guilty or not). The tree’s job is to separate the guilty from the innocent. But instead of relying on a hunch, it systematically asks:

“What’s the one question I can ask right now that best separates the groups?”

It runs through all possible features and thresholds:

“Was the suspect near the scene?”
“Was their alibi confirmed?”
“Were their fingerprints found on-site?”
“Do they have prior burglary convictions?”

For each potential question, it simulates the split and evaluates how clean the resulting groups are. If a split leads to mostly one class on each side - mostly “guilty” here, mostly “innocent” there - it’s a strong lead.

To measure this, the model uses impurity metrics - ways of quantifying how mixed a group is:

Gini impurity: low when a node contains mostly one class.
Entropy: high when things are messy, low when they’re clear.

The question that reduces impurity the most gets selected. That becomes the tree’s first branch.

The data is split, and the process repeats on each subgroup. It keeps asking new questions, following the evidence, tightening the case.

This continues until:

Every remaining group is pure (no ambiguity),
Or the data runs out,
Or we’ve hit predefined limits - like max_depth (how deep the tree can go), or min_samples_split (the minimum number of samples required to split a node)

At the end of each path (a leaf node) the tree delivers its verdict. A new suspect walks in, and based on their answers, the model traces a path down the tree and predicts: guilty or not.

What makes decision trees compelling isn’t just that they work - it’s that you can watch the whole case unfold, step by step, from first question to final verdict.

Lets make our detective

A simple burglary investigation. Each row is the suspect and each column is a feature known at the time.

The goal: to predict if the suspect was involved or not.

# Step 1: Import the libraries and the model
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import matplotlib.pyplot as plt

# Step 2: Define the case data
data = {
    'was_near_scene':        [1, 0, 1, 0, 1, 1],
    'alibi_verified':        [0, 1, 0, 1, 0, 1],
    'fingerprints_found':    [1, 0, 1, 0, 1, 0],
    'prior_conviction':      [1, 0, 1, 0, 1, 0],
    'involved':              [1, 0, 1, 0, 1, 0]
}

df = pd.DataFrame(data)

#Step 3: Separate features and labels
X = df.drop('involved', axis=1)  # The evidence
y = df['involved']               # The verdict

# Step 4: Create the model using entropy to measure impurity
model = DecisionTreeClassifier(criterion="entropy", max_depth=3, random_state=42)
model.fit(X, y)

# Step 5: Visaulize the tree
plt.figure(figsize=(12,6))
tree.plot_tree(model,
               feature_names=X.columns,
               class_names=['Innocent', 'Guilty'],
               filled=True,
               rounded=True)
plt.show()

# Step 6: Predicting for new suspects

# New suspects with known features
new_suspects = pd.DataFrame({
    'was_near_scene':     [1, 0],
    'alibi_verified':     [0, 1],
    'fingerprints_found': [1, 0],
    'prior_conviction':   [0, 0]
})

# Step 7: Make predictions
predictions = model.predict(new_suspects)

for i, verdict in enumerate(predictions):
    label = "Guilty" if verdict == 1 else "Innocent"
    print(f"Suspect {i+1}: {label}")

This output shows how the model uses evidence to arrive at its verdict - replicating the same decision logic from the visualization.

When decision trees go too far

Decision Trees are easy to understand, fast to train, and - as we’ve seen - incredibly transparent. But for all their charm, they come with some serious baggage.

Especially when left unchecked.

They love to overfit

Decision Trees don’t just learn patterns - they memorize exceptions. If one suspect in your dataset got caught despite having a clean alibi, the model won’t generalize. It’ll just create a special branch to handle that case - as if it’s a rule, not an outlier.

This makes the tree perfect on training data, but brittle in the real world. One new data point, and it fails to see the bigger picture.

Unless you limit how deep the tree can go (max_depth) or how much evidence is required to make a new split (min_samples_split), it’ll keep branching until every leaf is a one-person rule.

They’re sensitive to slight changes

Add a few random rows to your dataset - maybe a few suspects who coincidentally matched the pattern but were innocent - and the tree might restructure itself entirely.

That’s because decision trees are sensitive to the exact data they see. One noisy observation can pull the structure in a different direction, especially with small datasets. The splits aren’t always stable, and that makes the logic harder to trust.

They can’t handle nuance

Decision Trees are great when the world is made of sharp yes/no boundaries. But in real-world problems - where features blend together and influence each other subtly - trees fall short.

If a prediction depends on a bit of this and some of that and how they interact, a tree struggles. It wants to split. But sometimes, the answer isn’t a clean split - it’s a curve. And trees don’t do curves.

This is why they don’t work well with images, audio, or any data where smooth variation matters.

They make decisions too early

Decision trees are greedy. Not in the bad way - in the algorithmic way. At every step, they pick the split that looks best right now. They don’t think two steps ahead.

This means the tree might lock itself into a structure that seemed promising early on, but prevents better decisions later.

It’s like starting an investigation by assuming the suspect had a motive - and then only asking questions that follow from that assumption. If that early hunch was wrong, everything after it is compromised.

Decision Trees stand out because they make their reasoning visible. They don’t just tell you what they predicted - they show you how. Every question, every split, every path is laid out like a paper trail.

That kind of transparency is rare in machine learning.

But as we’ve seen, that clarity comes with tradeoffs. Trees are prone to overfitting, sensitive to small shifts in data, and limited by the very logic that makes them easy to follow.

Still, they remain one of the best tools to start with - especially when interpretability matters.

And when their weaknesses start showing? That’s when we bring in the reinforcements: ensembles like Random Forests and Gradient Boosted Trees. More complex, yes - but built on the same idea: ask smart questions, split the data, and follow the trail.

Because sometimes, one detective isn't enough. You need a whole squad

The Sherlock of Machine Learning