AI is often described as a black box - nobody truly understands how it works. But a new "brain scan" developed by researchers at Anthropic could be a solution to that problem: https://t.co/qbZXg7NeJZ
Hot take on a fascinating new paper on (partial) interpretability from @AnthropicAI: • The team was able to find (some) concept-like* “feature” representations for concepts ranging from the concrete to more abstract, from Golden Gate Bridge, to Secrecy, and Conflict of… https://t.co/I4NwxXcP5V
Here's some actual good news in AI! Researchers at Anthropic have made progress toward figuring out what goes on inside LLMs, identifying millions of "features" in Claude 3 that activate when specific concepts such as San Francisco, lithium, or deception are discussed. This…

Researchers at Anthropic have made significant progress in understanding the internal workings of large language models (LLMs), particularly Claude 3. Their new interpretability paper reveals detailed insights into the 'features' of Claude 3, identifying millions of these features that activate when specific concepts such as San Francisco, lithium, or deception are discussed. This breakthrough offers a glimpse into the previously mysterious operations of artificial neural networks, potentially addressing the long-standing issue of AI being perceived as a 'black box'. The research highlights concept-like feature representations for a range of ideas, from concrete entities like the Golden Gate Bridge to abstract notions such as secrecy and conflict. Additionally, a new 'brain scan' developed by the researchers could be a solution to understanding AI operations.
