LLM Explainability & Interpretability

LLMs are known for, often criticised, and feared for their lack of explainability in their outputs and poor interpretability. Even when we peek into those black boxes and observe the steps taken to generate the final responses, we don’t understand how they ‘think’. This flaw had been a major showstopper for many real-world applications. For example, suppose a bank uses LLMs to decide whether a loan should be approved, but cannot explain the exact reasons behind the decision to the customer or the regulators. It will be very awkward.

Fast-forward to June 2025, three years after LLMs made the headlines, leading LLM providers started to expose LLMs’ internal reasoning process, hoping to ease the concern about using LLMs in critical decision-making processes. Google announced Thought Summaries, and Anthropic introduced an open-source tool to trace the thoughts of LLMs with attribution graphs in June 2025.

Referecnes

Anthropic.com. (2025). Open-sourcing circuit-tracing tools. [online] Available at: https://www.anthropic.com/research/open-source-circuit-tracing [Accessed 5 Jun. 2025].
Google AI for Developers. (2025). Gemini thinking. [online] Available at: https://ai.google.dev/gemini-api/docs/thinking#summaries [Accessed 5 Jun. 2025].

Liwen's Notes

Recent Notes

Python Programming Language