Nov 16, 10:52 AM

Ultravox v0.4.1: 8B Open-Source LLM Approaches GPT-4o Performance with 150ms Response Time and MIT Licensed Checkpoints

Ultravox, a new open-source multimodal language model developed by FixieAI, is gaining attention for its real-time voice capabilities. The latest version, Ultravox v0.4.1, features an 8 billion parameter model that reportedly approaches the performance level of GPT-4o. It is designed to understand both text and human speech without requiring separate automatic speech recognition (ASR) systems. The model is pre-trained on Llama3.1-8b and a 70 billion parameter backbone, and it supports text output with a response time of approximately 150 milliseconds. Additionally, Ultravox is available with MIT licensed checkpoints, making it accessible for further development and training with various adapters, including Whisper as an audio encoder.

#Ultravox #FixieAI #MIT #Whisper

Written with ChatGPT (GPT-4o mini).

Ultravox v0.4.1: 8B Open-Source LLM Approaches GPT-4o Performance with 150ms Response Time and MIT Licensed Checkpoints

Sources

Additional media

Similar Stories

Similar Stories