InternVL 2.5: New Open-Source Multimodal Model from researchers of various Chinese universities MIT license The 38B version surpasses GPT-4o and Claude3.5Sonnet on the MMMU benchmark (Massive Multitask Multimodal Understanding). The 72B version is close to the o1 MMMU score. https://t.co/WFgx3XAj7O
🌟 Shanghai AI Lab’s OpenGVLab Drops a Game-Changer! 🌟 The InternVL 2.5 multimodal model is now open-source 🎉 and smashes past 70% on MMMU—comparable to GPT-4o & Claude-3.5-Sonnet 🚀. Here’s what makes it special: 1️⃣ 7 Model Sizes: 1B–78B, supporting images (single/multi),… https://t.co/uG65GlokR5
🥳We have released InternVL2.5, ranging from 1B to 78B, on @huggingface . 😉InternVL2_5-78B is the first open-source #MLLM to achieve over 70% on the MMMU benchmark, matching the performance of leading closed-source commercial models like GPT-4o. 🤗HF Space:… https://t.co/V5TEUq1SYG
This week saw significant advancements in open-source AI, particularly with the release of the InternVL 2.5 multimodal model by OpenGVLab. The model is available in sizes ranging from 1 billion to 78 billion parameters and has become the first open-source multimodal large language model (MLLM) to exceed 70% on the Massive Multitask Multimodal Understanding (MMMU) benchmark. This achievement positions InternVL 2.5 alongside leading closed-source models, such as GPT-4o and Claude-3.5-Sonnet. The model supports both single and multi-image inputs and is licensed under the MIT license. Researchers from various Chinese universities contributed to this development, with the 38 billion parameter version reportedly surpassing the performance of GPT-4o and Claude-3.5-Sonnet on the same benchmark.