









In 2024, Google DeepMind introduced a series of advancements in transformer-based language models, focusing on computational efficiency and performance optimization. The Mixture-of-Depths model, developed by D Raposo, S Ritter, B Richards, and others, allows for dynamic allocation of compute resources, achieving the same performance with significantly fewer floating-point operations (FLOPs) per forward pass. This method addresses the inefficiency in standard models where compute is uniformly spread across input sequences, despite not all tokens being equally difficult to predict. Additionally, Google announced the development of Transformer 2, which integrates attention, recurrence, retrieval, and feedforward networks (FFN) into a single module, offering up to 20 times better compute efficiency and the ability to process contexts up to 100 million lengths efficiently. Another notable contribution is the introduction of Representation Finetuning (ReFT) for Language Models by Stanford University, which is 10 to 50 times more parameter-efficient than previous state-of-the-art parameter-efficient finetuning (PEFT) methods. These innovations represent significant steps towards optimizing AI systems for better computational performance and scalability, especially in applications where resources are limited or efficiency is paramount.
New paper! 🫡 We introduce Representation Finetuning (ReFT), a framework for powerful, efficient, and interpretable finetuning of LMs by learning interventions on representations. We match/surpass PEFTs on commonsense, math, instruct-tuning, and NLU with 10–50× fewer parameters. https://t.co/nFUHqpu5YV
MASSIVE Paper: "ReFT: Representation Finetuning for Language Models" 🔥 📌 10x-50x more parameter-efficient than prior state-of-the-art PEFT methods. 📌 A hallmark of current state-of-the-art PEFTs is that they modify weights rather than representations. However, much prior… https://t.co/N6GZ8I73l8
Google presents Transformer 2 - Unifies attention, recurrence, retrieval, FFN into a single module - Performs on par with Transformer w/ 20x better compute efficiency - Efficiently processes 100M context length proj: https://t.co/sJn7V5O8qe abs: https://t.co/oQcMPOQgQS