I’ve built machine learning models — not thousands, but enough to notice something not-so-great about myself: I was obsessed with accuracy.

And I was trying everything I could to get that perfect score: tweaking features, tuning parameters, etc. But I some point, I had to admit that I didn’t always fully understand what I was doing. The more I tried, the more I got lost. Fine-tuning was becoming harder and harder. I started to wonder: If I am not able to understand how my model works, can I trust its outcomes? If I can’t explain the “why” behind a prediction, how can I improve it? Or defend it confidently in front of a stakeholder?

Wether a model is highly accurate or not, understanding why and how it came to its conclusions is crucial. As machine learning and AI algorithms are more deeply involved in the decision-making process in companies, the ability to interpret and explain the decisions and outcomes is now compulsory.

What are those? Interpretability? Explainability?

Interpretability is about being able to follow the logic behind the prediction. It is the degree to which we can understand how a model makes its predictions — for example, by looking at weights, structures, or rules.

Explainability goes a step further: it’s being able to explain why that specific outcome happened, in plain human terms.

These concepts help shed light on black boxes models — models so complex (like deep neural networks or ensemble methods) that their internal workings are nearly impossible to grasp without additional tools.

Yeah, okay, cool but do they really matter that much?

Here’s a little story. When I built my first machine learning model, I was working on a large sentiment analysis for a company and I kept going back to the labelling step because comments are sometimes a little bit tricky and I wanted to make sure I had high quality data. Despite my efforts, I had a hard time teaching my model how to recognise bad comments. I had labeled thousands of comments. So why wasn’t the model getting it? The answer wasn’t in the data — it was in the model’s logic. And that’s when I realised: I didn’t just need better data. I needed more visibility into the model’s decisions. Only then, would I be able to fix it. Explainable AI is crucial for:

Transparency and Trust: Stakeholders, clients, and decision-makers need more than metrics — they need to understand how the system works to believe in it. Explainable AI ensures stakeholders understand the models’ decision-making process.
Fairness: They ensure the model’s outcomes are free of bias and fair for all parties involved.
Accountability: It gives ground to take responsibility for what our models do, especially in domains like finance, healthcare, or HR.
Debugging & Improvement: When a model underperforms or behaves oddly, interpretability gives us the tools to go deeper. Which features is the model relying on too heavily? Are there hidden patterns that don’t make sense? Understanding this helps improve the model faster and with more intention.

There are plenty of tools and techniques that can be implemented to interpret or explain a model. Let’s take a look at some of them.

Interpretability techniques

There are two ways of approaching interpretability: through intrinsic interpretability (using inherently interpretable models) and post-hoc interpretability (using techniques to explain more complex models after training).

Intrinsic interpretability: it uses already interpretable models such as:
- Logistic and Linear Regression: those algorithms produce coefficient that are easily interpreted.
- Decision trees: these models make predictions by asking a sequence of questions at different nodes. At each node, the data is split based on a specific feature, gradually filtering it down until a final decision is reached at a leaf node. The decision-making process can be followed by tracing the tree.
- Rule-based models: those are models that make predictions based on pre-established rules (is something, then something).
Post-hoc interpretability: it uses tools to interpret a model after it is trained.
- LIME (Local Interpretable Model-agnostic Explanations): to put it simply, it fits/applies an interpretable model (such as Logistic Regression) locally, around a specific prediction made by a black-box model. It helps us understand that prediction by approximating the black-box model's behaviour near its decision boundary.
- SHAP (Shapley Additive Explanations): It offers unified explanations based on game theory. SHAP values help understand and evaluate the weight carried by each features in the model prediction, they represent how much each feature contributes to the overall results. It basically helps you understand the impact of each variable in the predictions.
- Partial Dependance Plot: this technique evaluates the correlation between each individual variable and the predicted outcome/prediction, while keeping the other features constant. It helps evaluate how a feature influences the predictions
- Feature permutation: This technique measures the importance of a feature by shuffling its value and observing if the model’s performance decreases. If it is the case, that means that the features holds great importance in the model prediction.

Explainability techniques

Explainability, like we mentioned before, is more about being able to point out to stakeholders what key features and patterns influence the predictions. The techniques here are solely based on this approach. They can be categorised in two groups: feature importance and model-agnostic techniques.

Feature Importance: Those techniques help us identify which feature influences the model prediction the most. Among feature importance techniques, we can name:
- Random Forest: it provides feature importance score based on how often a feature is used across the tree.
- Gradient Boosting: this techniques is very similar to Random Forest but it uses boosting to combine weaker features to create a stronger learner.
Model-Agnostic techniques: these techniques can be applied to any machine learning model. Among those, we can name:
- Global Surrogate Models: Concept-wise, this is very similar to LIME for interpretability. The key difference is that instead of being local, this technique provides insight into the overall decision process by approximating a simpler, interpretable model to the complex model at hand.
- Counterfactual Explanations: A counterfactual explanation simply provides an answer to the question, “What could have changed the outcome?”

As you can see, all those techniques basically have a similar foundation.

Just like interpretability and explainability techniques, there are many tools available today to help open up the black box — each with its own strengths. Some are model-agnostic, meaning they can work with any type of model. Others include optimised methods for specific model types (like decision trees or neural networks).

Here's a quick overview of the most popular tools used in the field to interpret and explain machine learning models:

Eli5: a Python library that helps explain machine learning models and their predictions. It supports both linear models and tree-based models, with human-readable outputs.
InterpretML: an open-source library that provide interpretable models and model-agnostic techniques like SHAP and LIME
LIME and SHAP: those tools (covered above) work with any models but SHAP has fast implementations for tree-based models like XGBoost or Random Forest
Alibi: This Python library offers model-agnostic explanation techniques, with additional tools for drift detection and fairness analysis.

Interpretability and explainability are powerful tools — but how do you actually use them when your model isn’t performing well? What happens when you need to fine-tune, improve, or even defend your model in a real-world scenario?

This will be the subject of the next article, where we will try to understand and improve a model that I had previously built using interpretability and explainability. If you’ve ever wondered how to make your model both smarter and more trustworthy — stay tuned.

Interpretability vs Explainability in Machine Learning: What They Are and Why They Matter

Interpretability techniques

Explainability techniques

Subscribe to my newsletter

Mahugnon DOUSSO

Mahugnon DOUSSO