Feb 28, 11:56 AM

Hume AI's Octave TTS Model Joins Sesame and ElevenLabs in Advancing Voice Technology

Hume AI has launched Octave, a new text-to-speech (TTS) model designed to create custom AI voices with tailored emotions. Octave, built on a large language model (LLM) specifically for TTS, enables it to understand the context of text and generate speech that reflects subtle meanings, emotions, and styles. Users can design voices using prompts and provide acting instructions to control the emotional delivery of the speech, making it suitable for applications in education, entertainment, and customer service. The upcoming Voice Cloning feature will allow replication of voices from brief audio samples. An internal study showed Octave was preferred for audio quality (71.6%), naturalness (51.7%), and fidelity to descriptions (57.7%). The Expressive TTS Arena invites public participation to evaluate and compare different TTS systems, promoting ongoing improvements for Octave. Sesame has introduced a new voice model that aims to achieve 'voice presence,' a quality that makes spoken interactions feel real, understood, and valued. The model, showcased through a demo featuring AI companions named Maya and Miles, focuses on emotional intelligence, conversational dynamics, contextual awareness, and consistent personality. This approach seeks to create more natural and engaging AI conversations. Sesame's research is based on a dataset of approximately 1 million hours of predominantly English audio, and three model sizes were trained: Tiny (1B backbone, 100M decoder), Small (3B backbone, 250M decoder), and Medium (8B backbone, 300M decoder). The company plans to open-source its research under an Apache 2.0 license to foster collaborative development. ElevenLabs has entered the speech-to-text market with Scribe, a model that claims to surpass competitors like Gemini 2.0 and Whisper v3 in accuracy. Scribe supports transcription in 99 languages, positioning ElevenLabs as a significant player in the AI-powered voice technology sector. The launch follows ElevenLabs' recent $180 million funding round, indicating strong investment in expanding its capabilities in voice technology.