Google DeepMind has introduced Gemini Diffusion, an experimental text diffusion model that represents a departure from traditional autoregressive large language models (LLMs). Unlike autoregressive models that generate text one token at a time, Gemini Diffusion employs diffusion technology, which has been widely used in image generation, to produce text at speeds ranging from 1,000 to 2,000 tokens per second. This model not only accelerates text generation but also matches or surpasses existing models in coding performance. Gemini Diffusion was demonstrated at Google I/O and is currently available via a waitlist. In parallel, research teams have developed multimodal diffusion language models such as MMaDA, LLaDA-V, and LaViDa, which extend diffusion techniques to handle multimodal tasks including textual reasoning, visual instruction tuning, and text-to-image generation. MMaDA-8B, for example, outperforms several state-of-the-art models like Show-o, SEED-X, SDXL, and Janus in multimodal understanding and generation tasks. Additionally, Gemini Diffusion supports novel features analogous to image inpainting, enabling masked text generation based on prompts. These advances suggest a shift toward diffusion-based approaches for faster, more versatile language and multimodal AI models.
LaViDa: A Large Diffusion Language Model for Multimodal Understanding "We introduce LaViDa, a family of VLMs built on DMs. We build LaViDa by equipping DMs with a vision encoder and jointly fine-tune the combined parts for multimodal instruction following. " "LaViDa achieves https://t.co/QJkbeiDA0E
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning "In this work, we introduce LLaDA-V, a purely diffusion-based Multimodal Large Language Model (MLLM) that integrates visual instruction tuning with masked diffusion models" "LLaDA-V achieves https://t.co/IIjxpWm20d
Diffusion language models go multimodal! Particularly impressive to see the speed and quality results on visual reasoning benchmarks. Great work led by my students @li78658171, @hbXNov and amazing collaborators. https://t.co/8ryvvC9QF4