Jul 8, 04:34 AM

Microsoft and Meta Innovate AI with MInference and MobileLLM, Achieving 10x Speedup

Researchers from Microsoft Corporation and the University of Surrey have developed MInference (Million-tokens Inference), a training-free efficient method for the pre-filling stage of long-context language models (LLMs) based on dynamic sparse attention. This innovation aims to supercharge long text processing and cut costs by using A-shape, Vertical-Slash, and Block-Sparse patterns, speeding up pre-filling for LLMs by up to 10x without losing accuracy. The code for MInference is open-source, allowing for broader adoption and potential transformation in AI processing by up to 90%. Additionally, Meta has introduced MobileLLM, a compact language model designed for mobile devices. MobileLLM prioritizes model depth over width, implements embedding sharing and grouped-query attention, and utilizes a novel immediate attention mechanism. This model aims to make sub-billion parameter LLMs suitable for on-device use, reducing reliance on cloud computing and improving response times. Meta's approach could lead to a shift from large-scale models to more efficient, smaller models for edge devices.

#Microsoft Corporation #University of Surrey #Meta #MobileLLM

Written with ChatGPT (GPT-4o).