Feb 25, 02:25 PM

Stanford, Zhejiang, Ant Group Researchers Unveil MinionS Protocol, Cutting LLM Costs by 5.7x with 97.9% Performance

Researchers from Stanford University, Zhejiang University, and Ant Group have introduced a new method to enhance the efficiency of large language models (LLMs). The approach, dubbed MinionS, involves a collaboration between small on-device LLMs and more powerful cloud-based models to handle complex reasoning tasks more economically. The MinionS protocol allows the cloud model to break down tasks into simpler subtasks that are processed locally on devices, reducing cloud inference costs by an average of 5.7 times while maintaining 97.9% of the performance of the cloud model alone. This method aims to address the challenges of following multi-step instructions and reasoning over long contexts by leveraging local data and computational resources. It was found that a naive collaboration protocol could achieve a 30.4x reduction in remote costs, but only recover 87% of the performance of the frontier model. The research also explores other advancements in LLM efficiency, such as LightThinker, which proposes dynamic compression of reasoning steps for financial, medical, and scientific tasks, and various frameworks like vLLM, LMDeploy, and SGLang, which focus on optimizing LLM inference through different techniques.

#Stanford University #Zhejiang University #Ant Group #MinionS #LightThinker #LMDeploy #SGLang

Written with ChatGPT .