We just made state-of-the-art TTS 20x more affordable. $5 per million characters. And we're open sourcing the training and modeling code (built on Llama). Because scaling voice AI shouldn't break your budget. Technical Details → https://t.co/MhXc9XZI71 Why and how we did it https://t.co/hV8qCkyCi3
Congratulations to the team at @inworld_ai, setting a new standard for real time speech models! A leap forward making TTS more accessible for products and use-cases everywhere.🚀 The tech + collaboration powering this is next-level. We'll have more to share soon! 🔥 https://t.co/nsyNmeIze7
We just made state-of-the-art TTS 20x more affordable. $5 per million characters. And we're making the the training and modeling code open-source (built on Llama). Because scaling voice AI shouldn't break your budget. Technical details → https://t.co/MhXc9XZI71 Why and how https://t.co/AEA9gKwYUB
Inworld AI released a new generation text-to-speech system, TTS-1, that the company says cuts the cost of high-quality voice synthesis to US$5 per million characters—about one-twentieth the prevailing market rate. The model, available immediately via API and in a browser-based playground, is designed for real-time applications such as gaming avatars, virtual assistants and fitness trainers, delivering the first two seconds of audio in as little as 200 milliseconds. The launch includes an experimental, more expressive variant dubbed TTS-1-Max and free zero-shot voice cloning that can replicate a speaker from a brief sample. Inworld is also opening its Llama-based training and inference code under a commercially permissive licence, pledging to publish a detailed technical report in the coming weeks. The company says the model supports 11 languages and embeds an imperceptible watermark to flag AI-generated audio. Safeguards are in place to prevent unauthorized voice cloning, and further capabilities—such as generating voices from text descriptions—are under development.