Tencent Hunyuan, in collaboration with Tencent Music, has launched HunyuanVideo-Avatar, an advanced AI model that transforms static photos and audio into dynamic, lifelike videos. This technology supports emotion-controlled animations, multi-character scenarios with separate audio controls, and works across various styles including cartoon, 3D, and real faces while preserving the subject's identity. The single-character mode, which allows up to 14 seconds of audio-generated video, has been open-sourced and is available on the Tencent Hunyuan website, with multi-character support expected to be released soon. The model automatically detects scene context and emotions to generate realistic speech and singing animations. Concurrently, Hume has introduced EVI 3, a personalized voice AI model capable of responding within 300 milliseconds and mimicking any voice with a personalized tone. Other advancements in voice AI include Rime's Arcana TTS model, which captures natural vocal nuances such as laughter, accents, and sighs, and NVIDIA's ACE suite updates that convert text to speech in multiple languages and transform audio into real-time facial animations via Audio2Face-3D. These developments highlight rapid progress in AI-driven voice and avatar technologies aimed at enhancing human-machine interaction and content creation.
🤯This is the first time Voice #AI has felt… genuinely human. It responded with a natural voice, understood the intent, & came back with a genuinely clever answer. We can actually see tech like this being used across enterprise. @AI_NURIX moving fast! https://t.co/MHp42mIduv https://t.co/LqxO0o1kUT
🎤 Tencent’s new Hunyuan Avatar = AI magic. Upload a photo + voice = full animated video. → Realistic speech → Emotions + gestures → Even singing Creators, this is your new superpower. https://t.co/NxTEBhMUDI https://t.co/Ei6p6QeywV
Quick demo of Voiceflow's new V3 Voice AI Architecture - way faster - with all the power of Voiceflow's orchestration platform⚡️ https://t.co/sZrEgBprKo