this is really superb work. if you liked the sonnet/golden-gate stuff you'll like this too they're open sourcing their GPT-2 SAEs too 😍 https://t.co/8Hg1guFg11
This is super cool work! Sparse autoencoders are the currently most promising approach to actually understanding how models "think" internally. This new paper demonstrates how to scale them to GPT-4 and beyond – completely unsupervised. A big step forward! https://t.co/jZ36peImDr
OpenAI's GPT-4 Surpasses Human Performance in Theory of Mind, Identifies 16 Million Features https://t.co/IIkWTEqNvc
On June 6, 2024, OpenAI introduced a groundbreaking technique to break down GPT-4 into 16 million interpretable features. This advancement is achieved through improved methods for training sparse autoencoders at scale, which help in disentangling GPT-4’s internal representations into features that often correspond to understandable concepts. This work marks significant progress in understanding the neural activity of language models and surpasses human performance in Theory of Mind. The new methods scale better than existing work and are completely unsupervised.