Aug 5, 09:15 PM

Alibaba Unveils Qwen-Image 20B MMDiT Model with English-Chinese Text Rendering and New 4B Qwen3 Models

Alibaba's Qwen team has launched Qwen-Image, a 20 billion-parameter open-source text-to-image generation model that excels in rendering complex, multi-line, and paragraph-level text in both English and Chinese. The model, based on the MMDiT architecture, offers state-of-the-art in-pixel text rendering capabilities and supports diverse image styles including photorealistic, anime, cyberpunk, sci-fi, minimalist, retro, surreal, and ink wash. Qwen-Image is particularly strong at creating graphic posters with native text and precise image editing functions such as style changes and object manipulation. It rivals leading AI models like GPT-4o, Imagen 3, and FLUX.1 Kontext in benchmark tests, especially for Chinese text rendering. The model is available under the Apache 2.0 license, enabling free deployment by developers. Additionally, the Qwen team has released upgraded 4 billion-parameter models, Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507, which feature enhanced general skills, multilingual coverage, long-context instruction following, and advanced reasoning in logic, math, science, and coding with support for a 256K token context window. These smaller models are optimized for edge devices and have been integrated into platforms like AnyCoder and Jan Hub for local deployment. The Qwen-Image model has been made accessible on platforms such as Hugging Face, Replicate, and Yupp, and is actively being tested in AI benchmarking communities like the Artificial Analysis Image Arena and LMArena.