The Allen Institute for AI (AI2) has launched Molmo, a new family of open-source multimodal language models. Molmo, available in 1B, 7B, and 72B-parameter sizes, outperforms several proprietary models, including GPT-4V, GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro, using significantly less data. The models are designed to understand and interact with multimodal data, enabling rich interactions in both physical and virtual worlds. Molmo's performance is bolstered by the PixMo dataset, which includes high-quality image-caption pairs and multimodal instruction data. Molmo also surpasses models like Flash on various benchmarks. AI2's CEO, Ali Farhadi, noted that this development demonstrates that open-source AI can now compete with closed, proprietary systems.
Molmo, a new family of state-of-the-art vision-language models (VLMs) that are fully open-source Built to rival proprietary systems like GPT-4V and Claude. Here's a breakdown: 🧵 1/ https://t.co/DhauR6fCF3
“What Molmo shows is that open-source AI development is now on par with closed, proprietary models, says Ali Farhadi, the CEO of Ai2.” Pretty cool! https://t.co/WEraiYsRX1
Are Small Language Models Really the Future of Language Models? Allen Institute for Artificial Intelligence (Ai2) Releases Molmo: A Family of Open-Source Multimodal Language Models https://t.co/d5yByp7HIT #MultimodalAI #OpenAccess #MolmoModels #AIInnovation #VisionLanguageMod… https://t.co/Kzm5zhBgDM