Feb 11, 02:12 PM

Zonos: New Open-Source TTS Model with Two 1.6B Models and High-Fidelity Voice Cloning Released by ZyphraAI

Zonos, a new open-source text-to-speech (TTS) model developed by ZyphraAI, has been released, featuring high-fidelity voice cloning and expressive speech capabilities. The model includes two variants, each with 1.6 billion parameters, utilizing transformer and SSM architectures. It supports real-time voice cloning using 5 to 30 seconds of audio input, and allows for adjustments in speed, pitch, audio quality, and emotional tone. Additionally, it offers enhanced speaker matching through the addition of text and audio prefixes. The model is reported to run at approximately double the real-time rate on an RTX 4090 graphics card. Zonos also supports multilingual output, broadening its applicability in various linguistic contexts.

#Zonos #ZyphraAI

Written with ChatGPT (GPT-4o mini).

Sources

Frank ⚡@jedisct1
1 year ago
Beta Release of Zonos-v0.1 - two expressive and real-time text-to-speech (TTS) models with high-fidelity voice cloning https://t.co/wfhZhgWr1j https://t.co/eXBsCijbOf
Awni Hannun@awnihannun
1 year ago
Request-for-port: Zonos in MLX / MLX Swift. Let's run high-quality, expressive TTS + voice cloning fast on-device. https://t.co/ktpizrkNYa
bitlauncher@bitlauncherai
1 year ago
Zyphra’s dropping some serious tech! Two 1.6B TTS models and voice cloning with open-weights? Impressive. Definitely checking this out! 🔥🎧 https://t.co/PyxAkYZNfF

Zonos: New Open-Source TTS Model with Two 1.6B Models and High-Fidelity Voice Cloning Released by ZyphraAI

Sources

Additional media

Similar Stories