
Google's Mixture-of-Depths method allows transformer-based language models to dynamically allocate compute, optimizing AI systems for better computational performance and scalability. This new approach enables transformers to learn to allocate compute to specific positions in a sequence, improving efficiency and resource utilization.





[CL] Training LLMs over Neurally Compressed Text B Lester, J Lee, A Alemi, J Pennington, A Roberts, J Sohl-Dickstein, N Constant [Google DeepMind] (2024) https://t.co/Iy9SB7vQXv - Training large language models (LLMs) over highly compressed text yields advantages in training and… https://t.co/9Bj1kcTOyW
Ever wonder why we don’t train LLMs over highly compressed text? Turns out it’s hard to make it work. Check out our paper for some progress that we’re hoping others can build on. https://t.co/mceqpUfZQo With @blester125, @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd
Localization in LLMs is often mentioned. But do localization methods actually localize correctly? In our #NAACL2024 paper, we (w/ @_jessethomason_, @robinomial) develop two benchmarking ways to directly evaluate how well 5 existing methods can localize memorized data in LLMs. https://t.co/S9CIXlt1xR