Sep 27, 02:32 AM

NeurIPS Spotlight: Advancements in LLMs Improve Long-Context Processing with 1000x Reduction

Recent advancements in the field of large language models (LLMs) have focused on improving their handling of long-context inputs. Key developments include the introduction of the Pluto and Charon (PAC) framework, which achieves significant speedup and memory reduction in fine-tuning LLMs. This framework offers up to 8.64x speedup and an 88.16% reduction in memory footprint. Another notable approach is the use of early layers as filters to select and compress input tokens, achieving a 1000x reduction in input tokens. Additionally, a new method involving 2:4 structured sparsity masks allows for efficient fine-tuning by freezing the LLM and learning binary masks for each linear layer. These innovations are aimed at addressing the challenges of long-context processing, making LLMs more efficient and scalable. A multilingual evaluation revealed significant performance drops with increased context length, multiple target sentences, and lower-resource languages. Some of these advancements have been recognized as a spotlight at NeurIPS.

#Pluto and Charon

Written with ChatGPT (GPT-4o).