Jul 8, 07:07 PM

New RNN Architecture 'Learning to (Learn at Test Time)' with Expressive Hidden States and Linear Complexity Shows Improved Performance

A new architecture, 'Learning to (Learn at Test Time)', has been developed for recurrent neural networks (RNNs), featuring expressive hidden states and linear complexity. This innovation allows the model to 'learn' from its context during test time, replacing the traditional Attention mechanism's costly KV cache. The architecture has shown better perplexity performance compared to Mamba. Developed over the last 1.5 years, the model scales effectively from 125M to 1.3B parameters, demonstrating significant improvements in long-context modeling. The sequence modeling layers were trained from Books scale, further enhancing their performance.

#Attention #Mamba #Books

Written with ChatGPT (GPT-4o).

Sources

/MachineLearning@slashML
2 years ago
Learning to (Learn at Test Time): RNNs with Expressive Hidden States https://t.co/g83JYT0kVk
Zhengzhong Tu@_vztu
2 years ago
🚨Learning to (Learn at Test Time): RNNs with Expressive Hidden States 🌟𝐏𝐫𝐨𝐣: https://t.co/0BXLYRgOUJ 🚀𝐀𝐛𝐬: https://t.co/4ANCg2z2He A new class of sequence modeling layers with linear complexity and an expressive hidden state https://t.co/zMaGui8suE
Arjun Vikram@arjvik
2 years ago
Proud to share what I've been working on for the past year, "Learning to (Learn at Test Time)"! Our new architecture trains a model to "learn" from its context, replacing Attention's costly KV cache with an expressive hidden state: the weights of a ML model!🤯 🧵by @karansdalal https://t.co/Dq3k4zt0DN

Additional media

Image #1 for story new-rnn-architecture-learning-to-learn-test-time-expressive-hidden-states-linear

New RNN Architecture 'Learning to (Learn at Test Time)' with Expressive Hidden States and Linear Complexity Shows Improved Performance

Sources

Additional media

Similar Stories