ContextualAI, in collaboration with Allen Institute for AI, has released OLMoE, a state-of-the-art, fully open-source Mixture-of-Experts (MoE) language model. OLMoE, led by Muennighoff, features 7 billion parameters but uses only 1 billion active per input token, making it highly efficient. The model has been pretrained on 5 trillion tokens and is designed to rival more costly models such as Gemma and Llama in performance. The release includes model weights, training data, code, and logs, and aims to provide a cost-effective yet powerful tool for language model research and application. Part of the OLMo family, OLMoE boasts a superior performance-to-cost ratio and incorporates 64 experts with 8 active at any given time.
OLMoE 1x7B by @allen_ai - apache licensed, 64 experts, 8 active - trained on 5T tokens, matches Gemma, Llama in perf, with orders of magnitude faster speed ⚡ > 1.3B Active and 6.9B Total - 5x fewer parameters than the comparative dense model > 64 experts per layer, 8 active… https://t.co/qabrRX7wYC
🐣Welcome the newest member to the OLMo family, OLMoE! This Mixture-of-Experts model is 100% open — it's efficient, performant, and ready for experimentation. Learn more on our blog: https://t.co/K6OAVEBmFx https://t.co/32O2AFjHJO
We’re proud to share our latest research, led by our own @Muennighoff and in partnership with @allen_ai: Introducing OLMoE, a best-in-class fully open source mixture-of-experts (MoE) language model with 1B active parameters that beats comparable LLMs and rivals many larger… https://t.co/R0ebuDygVE