Apr 19, 05:21 AM

Microsoft’s 1-Bit LLM and Google’s 27B-Parameter Gemma 3 Boost AI Efficiency on CPUs and GPUs, Introducing DolphinGemma

Microsoft and Google have made notable advancements in large language models (LLMs) designed to run efficiently on less powerful hardware. Microsoft released a 1-bit LLM with 2 billion parameters that can operate on some older CPUs, offering faster inference speeds and reduced energy consumption. This framework, bitnet.cpp, supports models including Llama3, Falcon3, and BitNet, achieving 6.17 times faster inference and 82.2% lower energy use on CPUs. Meanwhile, Google introduced Gemma 3, an open-source model with 27 billion parameters that can run on a single GPU, including consumer-grade GPUs, using quantization-aware training (QAT) to reduce model size from 54GB to 14.1GB with a 54% smaller drop in perplexity compared to post-training quantization. Gemma 3 demonstrates strong performance in chat, summarization, classification, and image recognition tasks. Additionally, Google is exploring novel applications of Gemma models, such as DolphinGemma, aimed at decoding dolphin communication, potentially opening new research avenues in interspecies interaction. These developments from both companies suggest a trend toward more accessible and efficient AI models that can be deployed without extensive computational resources.

#Microsoft #Google #Llama3 #Falcon3 #BitNet #Gemma #DolphinGemma

Written with ChatGPT (GPT-4).