
OpenAI has recently introduced GPT-4o, an AI voice assistant that mimics human speech but has limitations. However, Kyutai Labs has released Moshi, a real-time native multimodal foundation model that can listen and speak simultaneously, a feature GPT-4o lacks. Moshi can express and understand emotions, such as speaking with a French accent. The model will be open-sourced, and a smaller variant can run on laptops, making it accessible to more users. This development raises questions about the future of AI and whether Moshi represents a significant advancement over OpenAI's impressive offerings.
If you can't wait for 4o voice then there's good news for you. This startup created a multimodal model that has real-time audio. It can even listen and respond at the same time. This is something that even OpenAI 4o can't do. Best part, this will be totally open-sourced! https://t.co/yPoy3WZD9z
Really fast inference with Moshi, the new real-time audio model released by @kyutai_labs. It will be released open source and smaller variant can run on laptops. Watch the demo here: https://t.co/BlgBddWha4 https://t.co/sGcfnKcuUl
Did Open Science just beat @OpenAI? 🤯@kyutai_labs just released Moshi, a real-time native multimodal foundation model that can listen and speak, similar to what OpenAI demoed GPT-4o in May. 👀 Moshi: > Expresses and understands emotions, e.g. speak with “french access” >… https://t.co/PFIcUp2zzD
