Alibaba Group has launched Qwen2.5-Omni-7B, a multimodal artificial intelligence model with 7 billion parameters, capable of processing text, images, audio, and video inputs, while generating real-time text and natural speech outputs. The model is open-sourced and is available on platforms such as Hugging Face and GitHub. The Qwen2.5-Omni-7B model features the Thinker-Talker architecture and is optimized for edge devices like smartphones and laptops, offering cost-effective solutions for AI applications. It supports real-time voice and video chat capabilities, making it suitable for intelligent voice applications and accessibility tools, such as real-time audio descriptions for visually impaired users. Alibaba has positioned this model as a competitor in the growing multimodal AI market, outperforming Google's Gemini-1.5-Pro in benchmarks like OmniBench. The company plans to use the model to develop AI agents and has committed to investing $53 billion in AI and cloud infrastructure over the next three years.
Alibaba launches new open-source AI model for ‘cost-effective AI agents’ https://t.co/uxwRSd2M6Q #OODA
🔔 Now live on Together AI: Qwen2.5-VL 72B Instruct This flagship vision-language model by @Alibaba_Qwen brings advanced visual understanding, long video comprehension, agentic capabilities & structured outputs. Details below 👇 https://t.co/rCXBmFpUrm
ICYMI - Alibaba just released Qwen2.5-Omni-7B! A fully open-source, end-to-end multimodal model that handles text, images, audio, and video, and responds in real time with text or speech! 🔥 ↳ Real-time voice & video chat ↳ "Thinker-Talker" architecture (reason + speak like a https://t.co/9MkC4XfPOT