Kyutai Labs, a French AI startup, has unveiled Moshi, a groundbreaking real-time multimodal foundation model capable of listening, speaking, and understanding emotions. Moshi, which can run on consumer laptops and GPUs, is set to be open-sourced, offering a competitive alternative to OpenAI's GPT-4o. Developed by an 8-person team in just six months, Moshi features low latency of under 300ms, achieving 160ms latency with a Real-Time Factor of 2, and supports 70 different emotions and styles. The model's capabilities include real-time conversation, role-playing, and providing explanations. Despite some initial robotic voice quality, Moshi's fast response times and natural interaction have been well-received. The release includes the code, model, and accompanying research paper. Moshi operates with a 7B Multimodal LM and a 2 channel I/O system.
Moshi AI: Real-Time Personal AI Voice Assistant - Beats GPT-4o!: https://t.co/1tYGhIV9Pd Try it Out (US Server): https://t.co/HRfESrOGM3 Try it Out (EU Server): https://t.co/MuiN08Zwbl https://t.co/YRNexd2vMW
Impressive debut of Moshi by @kyutai_labs! While ChatGPT offers a full suite, Moshi's core model shows great promise. Exciting to see how Kyutai builds on this solid foundation.
Moshi and character AI are both really good voice call AIs. Moshi is so fast I kinda feel there’s no further need for speed optimisation to reach that personal assistant goal lol