
Mistral AI has released a new 8x22B MoE (Mixture of Experts) model, known as Mixtral 8x22b, marking a significant advancement in their open-source large language models (LLMs). This model, boasting 176 billion parameters and a 65k context length, is designed to be fine-tuned on custom datasets. It's accessible for testing in ALPHA via public cloud services and is expected to deliver performance between GPT-4 and Claude Sonnet. The model maintains compatibility with Mistral's previous tokenizer, identical to the Mistral 7b, and introduces a larger base model for enhanced capabilities. It requires approximately 260GB of VRAM for operation, with detailed comparisons to other state-of-the-art models anticipated soon. This release coincides with Google Cloud Next day, highlighting the model's potential for broad application and research in AI.
mixtral 8x22B - things we know so far 🫡 > 176B parameters > performance in between gpt4 and claude sonnet (according to their discord) > same/ similar tokeniser used as mistral 7b > 65536 sequence length > 8 experts, 2 experts per token: More > would require ~260GB VRAM in… https://t.co/O1neBb9Nxs
Mistral’s surprise model is 8x22B which is 176billion params, like GPT3.5 Very excited about this given that Mixtral, at 56B, is already close/surpassing 3.5 So new mistral will be even better, perhaps approaching gpt4
👀 8x22b Mixtral, 65K context window https://t.co/5vga8JRVJp
