
A new architecture, 'Learning to (Learn at Test Time)', has been developed for recurrent neural networks (RNNs), featuring expressive hidden states and linear complexity. This innovation allows the model to 'learn' from its context during test time, replacing the traditional Attention mechanism's costly KV cache. The architecture has shown better perplexity performance compared to Mamba. Developed over the last 1.5 years, the model scales effectively from 125M to 1.3B parameters, demonstrating significant improvements in long-context modeling. The sequence modeling layers were trained from Books scale, further enhancing their performance.
Learning to (Learn at Test Time): RNNs with Expressive Hidden States https://t.co/g83JYT0kVk
🚨Learning to (Learn at Test Time): RNNs with Expressive Hidden States 🌟𝐏𝐫𝐨𝐣: https://t.co/0BXLYRgOUJ 🚀𝐀𝐛𝐬: https://t.co/4ANCg2z2He A new class of sequence modeling layers with linear complexity and an expressive hidden state https://t.co/zMaGui8suE
Proud to share what I've been working on for the past year, "Learning to (Learn at Test Time)"! Our new architecture trains a model to "learn" from its context, replacing Attention's costly KV cache with an expressive hidden state: the weights of a ML model!🤯 🧵by @karansdalal https://t.co/Dq3k4zt0DN
