May 21, 12:09 PM

ByteDance Launches BAGEL, 7B Parameter Decoder-Only Multimodal AI Model Under Apache 2.0 License Outperforming Qwen 2.5-VL

ByteDance's Seed team has introduced BAGEL, a 7-billion parameter open-source multimodal foundation model that integrates text, image, video, and 3D understanding and generation capabilities into a single unified decoder-only architecture. Released under the Apache 2.0 license, BAGEL outperforms leading visual-language models such as Qwen 2.5-VL and InternVL-2.5. The model employs a mixture-of-transformer-experts approach with dual encoders and is pretrained on trillions of data points. ByteDance also published a detailed 37-page report outlining BAGEL's innovative "Integrated Transformer" architecture, which enables the model to function both as a GPT-like autoregressive model and a DiT diffusion model. BAGEL's open-source nature allows for fine-tuning, distillation, and deployment across various applications, providing functionality comparable to proprietary systems like GPT-4o and Gemini 2.0 while enabling valuable image generation through its native multimodal design.

#ByteDance #Seed #BAGEL #Apache #Qwen #InternVL #Integrated Transformer #Gemini

Written with ChatGPT (GPT-4).

Sources

Additional media

Image #1 for story bytedance-launches-bagel-7b-parameter-decoder-only-multimodal-ai-model-under-2-0-98d30a46

Image #2 for story bytedance-launches-bagel-7b-parameter-decoder-only-multimodal-ai-model-under-2-0-98d30a46

ByteDance Launches BAGEL, 7B Parameter Decoder-Only Multimodal AI Model Under Apache 2.0 License Outperforming Qwen 2.5-VL

Sources

Additional media

Similar Stories

Similar Stories