Oct 28, 04:53 AM

Microsoft's YOCO and MemLong Boost LLM Efficiency; Shanghai Lab Releases 1M Context Window LLM

Microsoft has introduced a new caching technique for language models called 'You Only Cache Once' (YOCO), which optimizes memory usage by storing key-value pairs just once. This innovation aims to make large language models (LLMs) more efficient. Additionally, MemLong, another solution, allows LLMs to handle up to 80,000 tokens on a single GPU by storing past context in memory banks, significantly enhancing their processing capabilities by processing 20x more context. In China, Alibaba's Qwen and a new F5-TTS model have shown significant advancements in open AI. Furthermore, a Shanghai lab has released a 1 million context window open-source LLM, with future developments aiming for even larger context windows, potentially reaching up to 100 million or 1 billion tokens. Another method, KV Cache Compression, retains 97% performance by allocating cache budgets based on head importance.

#Microsoft #You Only Cache Once #MemLong #China #Alibaba #Qwen #F5 #Shanghai #KV Cache Compression

Written with ChatGPT (GPT-4o).

Sources

Additional media

Image #1 for story microsoft-s-yoco-memlong-boost-llm-efficiency-shanghai-lab-releases-1m-context-c8402be1

Microsoft's YOCO and MemLong Boost LLM Efficiency; Shanghai Lab Releases 1M Context Window LLM

Sources

Additional media

Similar Stories

Similar Stories