Apr 10, 05:46 PM

Mistral AI Launches Mixtral 8x22B: A 176B Parameter Model Surpassing GPT-4

Mistral AI has launched Mixtral 8x22B, a new AI model described as a sparse mixture-of-experts model, marking a significant advancement from its previous model, Mixtral 8x7B, released in December 2023. The Mixtral 8x22B model boasts 176 billion parameters, placing its performance between GPT-4 and Claude Sonnet, according to discussions on their Discord. It utilizes the same or similar tokenizer as the Mistral 7B model, supports a sequence length of 65,536 with 8 experts, and requires approximately 260GB of VRAM in fp16 or 73GB in bnb. The model, which uses RoPE and has a 32,000 vocab size, has been made available through various platforms, including MLX on an M2 Ultra as a 4-bit quantized model, Hugging Face, and Fireworks, offering capabilities for fine-tuning and application in different AI tasks. The release has generated significant interest within the AI community, with discussions around its potential applications and performance benchmarks, including a notable 77% on MMLU.

#Mistral AI #Mixtral #Claude Sonnet #Discord #Mistral 7B #MLX #M2 Ultra #Hugging Face #Fireworks

Written with ChatGPT (GPT-4).