New open Omni model released! 👀@OpenBMB MiniCPM-o 2.6 is a new 8B parameters, any-to-any multimodal model that can understand vision, speech, and language and runs on edge devices like phones and tablets. TL;DR: 🧠 8B total parameters (SigLip-400M + Whisper-300M + ChatTTS-200M… https://t.co/0pXaWHaZO2
MiniCPM-o 2.6: An 8B size, GPT-4o level Omni Model runs on device @bimedotcom @Khulood_Almani @theomitsa @FmFrancoise @sulefati7 @NathaliaLeHen @IanLJones98 @bamitav @rvp @sallyeaves @BetaMoroney @sonu_monika @TheAIObserverX https://t.co/5mKjdwtOSA
Fine-tuning of MiniCPM-o is now available at LLaMA-Factory🤗 https://t.co/6HADHlM6Hn
OpenBMB has released MiniCPM-o 2.6, an 8 billion parameter multimodal large language model (LLM) that reportedly outperforms competitors such as GPT-4o, Gemini 1.5 Pro, and Sonnet in various tasks. The model achieves an average score of 70.2 on OpenCompass for visual tasks, while also excelling in bilingual real-time speech recognition and transcription, utilizing 75% fewer vision tokens. MiniCPM-o 2.6 supports over 30 languages and is capable of real-time video and audio understanding. It is designed to run on edge devices, including smartphones and tablets. Additionally, fine-tuning options for the model have been made available through LLaMA-Factory.