ByteDance has unveiled its new AI model, OmniHuman-1, which is capable of generating videos with realistic lip synchronization, even from side views. This advanced technology can also dynamically control facial expressions and gestures using audio and text inputs. The model is expected to have a significant impact on Hollywood movie production, enhancing the quality and efficiency of video creation. OmniHuman-1 has already demonstrated its capabilities by generating music videos from images. The AI model represents a notable advancement in the field of video generation, showcasing China's growing strength in artificial intelligence.
🇨🇳China's AI is keep getting better and better.. ByteDance's new AI model, OmniHuman-1, can generate videos with realistic lip sync—even from a side view. It can also control facial expressions and gestures using just audio and text. This is what happens when your country… https://t.co/cvUxNGjTh0
Awesome research from ByteDance continues. Current methods of Subject-to-video merges text prompts and reference images to produce consistent videos, yet many approaches fail to preserve subject fidelity. This new research Phantom merges text and reference-image features in a… https://t.co/VOGkseW22N
Image-to-video generation struggles with balancing contradictory objectives like prompt faithfulness and artistic freedom. This paper introduces YinYang-Align, a benchmark to evaluate Text-to-Image alignment across six contradictory objectives. It also presents Contradictory… https://t.co/8qy2suvQ24