Kimi Moonshot has announced the release of Kimi-VL, a lightweight yet powerful Vision-Language Model (VLM) that features strong multimodal reasoning capabilities. The model, which includes both a mixture of experts (MoE) VLM and an MoE Reasoning VLM, operates with approximately 3 billion activated parameters. It reportedly outperforms larger models, including GPT-4o, on various vision and math benchmarks. Additionally, the company introduced MegaMath, an open-source math reasoning pre-training corpus comprising 371 billion tokens of high-quality web, code, and synthetic data. This new corpus surpasses the previous DeepSeek-Math in scale and enhances math reasoning performance by up to 20%. Both Kimi-VL and MegaMath are available under an MIT license.
Moonshot AI just dropped Kimi-VL-A3B on Hugging Face https://t.co/imy1CuXiGi
Big release coming out of @Kimi_Moonshot - KimiVL A3B Instruct & Thinking - Multimodal LMs, 128K context AND MIT license 🔥 > Outperforms GPT4o on vision + math benchmarks 💥 > MoE VLM and an MoE Reasoning VLM with only ~3B active params > strong multimodal reasoning (36.8% on https://t.co/R2apObRI4D
Kimi just dropped a tiny 3B thinking model that beats way larger models, including GPT-4o. Impressive! https://t.co/2fN4dp9h8q