A few weeks ago, I stumbled upon Neel Nanda’s blog, where he talked about embracing imperfection in writing. He decided to publish raw, unpolished drafts, letting his inner thoughts spill onto the page. Inspired by that, I thought, why not try the same? Here I am, experimenting with publishing first drafts, raw thoughts, and minimal polish. Let’s see where this goes.

Enough of this rambling, let’s get back to the topic. For quite some time now, I have been interested in how we can build AI systems responsibly. Of course, the most obvious solution would be to make them more explainable and interpretable. Yet, distinguishing between these two terms can be tricky sometimes, even for experts. Quite a few papers and books even use the terms interchangeably, which honestly, seems reasonable. But when we’re talking about a specific system- say, a medical diagnostic model, what should we prioritise between the two? This is where the need to differentiate between them arises.

A simple way to differentiate between the two is by answering the ‘Why’ and the ‘How’, as is clear from this post.

Explainability answers why a model made a specific decision. For example, let’s say a Named Entity Recognition (NER) model tags “Paris” as a location. Tools like LIME or SHAP can highlight the words, such as “capital” or “France” that influenced this decision. This is explainability. It provides post-hoc insights into the model’s output.

Interpretability, on the other hand focuses on how the model reached its decision. For a Transformer-based model like BERT, this might involve analysing attention weights to see which words the model prioritised. Or it could mean visualising how hidden layers process and transform “Paris” into a vector that the model recognises as a location. Basically, interpretability digs into the model’s internal mechanics.

But then one might think, is an explainable system always interpretable? Well, not necessarily. A model can explain its decisions without us fully understanding its internal workings. And here’s the kicker: even if we understand the how a decision was made by a model, it doesn’t guarantee we’ll agree with why that decision was made. And this is one of the things that feels like an endless rabbit hole.

Maybe we can simplify it a bit more. Let’s start with this analogy:

Imagine you come across a safe with a digital lock. You punch in a random code, and voila! It opens. You’re amazed but also curious: how did this happen?

Explainability: Someone says, “The safe opened because the code you entered matched a pre-set combination.” This explains why it opened.

Interpretability: Now, suppose that same person opens the safe and shows you the inner workings. The gears, sensors, and electronics that verified your code and unlocked the mechanism. This reveals how the safe operates.

See the difference? Explainability gives you the reasoning; interpretability shows you the process. I admit, I still get confused sometimes. But this example helps me keep things somewhat straight

Now, Why Do We Care?

Understanding the distinction between explainability and interpretability is crucial when designing AI systems, especially in high-stakes domains like healthcare, finance, or criminal justice. An explainable system can provide users with the rationale behind decisions, fostering trust. Meanwhile, an interpretable system ensures developers and regulators understand the underlying mechanics, enabling more effective debugging and bias mitigation. Both aspects play pivotal roles in building responsible AI.

Final Thoughts

We could go on and on about why we need to make our models more interpretable and explainable. One simple reason- as Christopher Molnar explains is, human mind is naturally curious. We want to understand why things happen, especially when they are unexpected. Why did the dog bite me, even though it has never shown aggression before? Why was my loan application rejected? For an AI system to be trusted by humans, even a modest explanation of its behaviour is somewhat essential.

But for now, we’ll stop here and save that for another post. If you’re curious in the meantime, I highly recommend referring to Molnar’s book- Interpretable Machine Learning. Though it was originally published in 2019, it is still highly relevant today.

Explainability vs. Interpretability: A Tale of Two Questions

Now, Why Do We Care?

Final Thoughts

Subscribe to my newsletter

Nityaa Kalra

Nityaa Kalra