Feb 24, 07:13 PM

2025 Research Papers Boost LLM Performance, Extend Context to 3 Million Tokens

A series of 12 groundbreaking research papers have been published in 2025, focusing on enhancing the performance of large language models (LLMs). These papers, released within the first 50 days of the year, explore new methods to extend the context length and refine the core architecture of LLMs. Key innovations include the DarwinLM method, which utilizes evolutionary structured pruning to achieve a 2x reduction in model size with only a 3% performance loss across various tasks. This approach combines an evolutionary search process with a lightweight training procedure to identify optimal model substructures. Another advancement is the InfiniteHiP framework, which extends the LLM context length to 3 million tokens on a single GPU. This is achieved through hierarchical token pruning, adaptive adjustments to Rotary Position Embeddings (RoPE), and efficient memory management techniques within the SGLang system. LongRoPE, a technique that modifies RoPE to handle context windows beyond 2 million tokens, has also been introduced. It scales down high-frequency components of RoPE embeddings as context length increases, maintaining low perplexity scores across various evaluation lengths and achieving over 90% accuracy in tasks requiring long contexts. This research also explores the concept of Intrinsic Space, Scaling Laws, optimal context length, and theoretical bounds for context length scaling.

#DarwinLM #InfiniteHiP #Rotary Position Embeddings #SGLang #LongRoPE #Intrinsic Space #Scaling Laws

Written with ChatGPT .