
Mistral AI has launched Mixtral 8x22B, a new AI model described as a sparse mixture-of-experts model, marking a significant advancement from its previous model, Mixtral 8x7B, released in December 2023. The Mixtral 8x22B model boasts 176 billion parameters, placing its performance between GPT-4 and Claude Sonnet, according to discussions on their Discord. It utilizes the same or similar tokenizer as the Mistral 7B model, supports a sequence length of 65,536 with 8 experts, and requires approximately 260GB of VRAM in fp16 or 73GB in bnb. The model, which uses RoPE and has a 32,000 vocab size, has been made available through various platforms, including MLX on an M2 Ultra as a 4-bit quantized model, Hugging Face, and Fireworks, offering capabilities for fine-tuning and application in different AI tasks. The release has generated significant interest within the AI community, with discussions around its potential applications and performance benchmarks, including a notable 77% on MMLU.















Fireworks now offers the first (to our knowledge) hosted, instruct variant of Mixtral 8x22B! Try it at https://t.co/accH6AJGCB or download it at https://t.co/8KZytSdvpp Thanks @Teknium1 and @NousResearch for the great dataset! https://t.co/jlYwoxO7Uz https://t.co/D58o51uJ28
Mixtral 8x22B is an exciting launch but is not yet ready for production use for most use-cases The version of the model released by Mistral is the base model and is not instruct/chat fine-tuned. This means that it isn’t designed for the prompt & answer style that most… https://t.co/S20MM9RO7J
The latest Mixtral 8x22B is really, really bad.