Jul 3, 08:41 PM

Kyutai Labs Unveils Moshi: Real-Time Open-Source GPT-4o Alternative

Kyutai Labs, a French AI startup, has unveiled Moshi, a groundbreaking real-time multimodal foundation model capable of listening, speaking, and understanding emotions. Moshi, which can run on consumer laptops and GPUs, is set to be open-sourced, offering a competitive alternative to OpenAI's GPT-4o. Developed by an 8-person team in just six months, Moshi features low latency of under 300ms, achieving 160ms latency with a Real-Time Factor of 2, and supports 70 different emotions and styles. The model's capabilities include real-time conversation, role-playing, and providing explanations. Despite some initial robotic voice quality, Moshi's fast response times and natural interaction have been well-received. The release includes the code, model, and accompanying research paper. Moshi operates with a 7B Multimodal LM and a 2 channel I/O system.

#Kyutai Labs #French #Moshi #OpenAI

Written with ChatGPT (GPT-4o).

Kyutai Labs Unveils Moshi: Real-Time Open-Source GPT-4o Alternative

Sources

Additional media

Similar Stories