Apr 24, 10:41 AM

Nari Labs' Dia: 1.6B Parameter Open-Source TTS Model With Zero-Shot Voice Cloning, Non-Verbal Sounds

Nari Labs, founded by two Korean undergraduate students, has launched Dia, a 1.6 billion parameter open-source text-to-speech (TTS) model. Developed with zero funding and support from Google's TPU Research Cloud and the Hugging Face ZeroGPU grant, Dia is designed to produce naturalistic dialogue from text prompts and is available under an Apache 2.0 license for both commercial and academic use. Dia offers features such as zero-shot voice cloning, synthesis of non-verbal sounds like coughing and laughter, support for multiple speakers, and real-time speech synthesis on consumer-grade hardware. The model supports advanced controls including emotional tone, speaker tagging, and nonverbal audio cues, all from plain text. It is currently English-only and requires about 10GB of VRAM, running on PyTorch 2.0+ and CUDA 12.6. Developers can access Dia via GitHub or Hugging Face, with deployment options including a Python library, CLI tool, and a Gradio-based demo. Early evaluations and direct comparisons show that Dia outperforms proprietary TTS solutions such as ElevenLabs Studio, Sesame, and OpenAI's gpt-4o-mini-tts, especially in expressiveness, timing, and handling of nonverbal behaviors. Nari Labs prohibits the use of Dia for impersonation, misinformation, or illegal activities, and encourages responsible experimentation. The model has quickly gained attention in the open-source AI community, providing an accessible and customizable alternative to commercial TTS platforms.

#Nari Labs #Korean #Dia #Google #TPU Research Cloud #Hugging Face ZeroGPU #Apache #English #PyTorch #CUDA #GitHub #Hugging Face #Python #CLI #Gradio #ElevenLabs Studio #Sesame #OpenAI #TTS

Written with ChatGPT (GPT-4).

Sources

Additional media

Image #1 for story nari-labs-dia-1-6b-parameter-open-source-tts-model-zero-shot-voice-cloning-non-c2785bae

Image #2 for story nari-labs-dia-1-6b-parameter-open-source-tts-model-zero-shot-voice-cloning-non-c2785bae

Image #3 for story nari-labs-dia-1-6b-parameter-open-source-tts-model-zero-shot-voice-cloning-non-c2785bae

Nari Labs' Dia: 1.6B Parameter Open-Source TTS Model With Zero-Shot Voice Cloning, Non-Verbal Sounds

Sources

Additional media

Similar Stories