Nari Labs, founded by two Korean undergraduate students, has launched Dia, a 1.6 billion parameter open-source text-to-speech (TTS) model. Developed with zero funding and support from Google's TPU Research Cloud and the Hugging Face ZeroGPU grant, Dia is designed to produce naturalistic dialogue from text prompts and is available under an Apache 2.0 license for both commercial and academic use. Dia offers features such as zero-shot voice cloning, synthesis of non-verbal sounds like coughing and laughter, support for multiple speakers, and real-time speech synthesis on consumer-grade hardware. The model supports advanced controls including emotional tone, speaker tagging, and nonverbal audio cues, all from plain text. It is currently English-only and requires about 10GB of VRAM, running on PyTorch 2.0+ and CUDA 12.6. Developers can access Dia via GitHub or Hugging Face, with deployment options including a Python library, CLI tool, and a Gradio-based demo. Early evaluations and direct comparisons show that Dia outperforms proprietary TTS solutions such as ElevenLabs Studio, Sesame, and OpenAI's gpt-4o-mini-tts, especially in expressiveness, timing, and handling of nonverbal behaviors. Nari Labs prohibits the use of Dia for impersonation, misinformation, or illegal activities, and encourages responsible experimentation. The model has quickly gained attention in the open-source AI community, providing an accessible and customizable alternative to commercial TTS platforms.
https://t.co/I7euHx0juK Tavus just launced (on Fal) their hummingbird lipsync model - it's pretty cool though hard to say with confidence "this is SOTA" There are two approaches to content repurposing, avatar models (@hedra_labs, @HeyGen_Official ) that will take an image and https://t.co/93GVVqjB82
Hummingbird-0 by @heytavus is a zero-shot, realistic AI lip-syncing tool for video Drop any MP4 video file + MP3 audio file, and get a perfectly lip-sync'd clip in a minute Veo/Kling + ElevenLabs + Tavus = Hollywood on your laptop Links below to try it https://t.co/kdW3UCI9jH
Hummingbird-0 by @heytavus is a zero-shot, photorealistic AI lip-syncing tool for video. Drop any MP4 video file + MP3 audio file, and get a perfectly lip-sync'd clip in about a minute Veo/Kling + ElevenLabs + Tavus = Hollywood on your laptop. Links below to try it: