Microsoft and Google have made notable advancements in large language models (LLMs) designed to run efficiently on less powerful hardware. Microsoft released a 1-bit LLM with 2 billion parameters that can operate on some older CPUs, offering faster inference speeds and reduced energy consumption. This framework, bitnet.cpp, supports models including Llama3, Falcon3, and BitNet, achieving 6.17 times faster inference and 82.2% lower energy use on CPUs. Meanwhile, Google introduced Gemma 3, an open-source model with 27 billion parameters that can run on a single GPU, including consumer-grade GPUs, using quantization-aware training (QAT) to reduce model size from 54GB to 14.1GB with a 54% smaller drop in perplexity compared to post-training quantization. Gemma 3 demonstrates strong performance in chat, summarization, classification, and image recognition tasks. Additionally, Google is exploring novel applications of Gemma models, such as DolphinGemma, aimed at decoding dolphin communication, potentially opening new research avenues in interspecies interaction. These developments from both companies suggest a trend toward more accessible and efficient AI models that can be deployed without extensive computational resources.
➡️ Gemma 3 QAT Models enhance AI capabilities on consumer GPUs with state-of-the-art technology. https://t.co/wYbgh1feiC
DolphinGemma: Wie Google KI dabei hilft, die Kommunikation von Delfinen zu entschlüsseln – The Keyword Deutschland https://t.co/lDFXtbk605
Google is building an AI model to talk to dolphins | The Daily Star https://t.co/ebaJb8gede