In the rapidly evolving landscape of artificial intelligence, interpretability has become something of a holy grail. Researchers at leading AI labs, such as OpenAI (e.g., Bills et al., 2023; Wu et al., 2023) and Anthropic (e.g., Bricken et al., 2023;...