MistralAI has recently released a new 8x22B Mixture of Experts (MoE) model, sparking discussions and analyses within the tech community. The model, noted for its significant size and potential capabilities, features a 65k context size, and comparisons are already being drawn to its ability to potentially outperform existing models such as DBRX and even ChatGPT-3.5, though it may still fall short of more advanced models. Technical details shared by enthusiasts reveal that the tokenizer used in this model is identical to that of Mistral's previous 7b model, and the base model size has increased. The new model's hardware requirements are substantial, with a 16bit precision needing 258GB of VRAM, though optimizations could reduce this to 73GB or even 58GB with specific configurations. The release has also raised questions about MistralAI's product lineup and strategy, particularly in how this model fits into it and what it signifies for future developments.
New Mixtral 8x22B MoE model with 64k Context is out in the wild. https://t.co/ZbJUkcncRL
Mistral just dropped the weights of what seems to be a 8x22B MoE with 65k context size. The license is unknown but previous releases were permissive. We will soon see how good the model is, but based on the size, it should beat DBRX, so be better than ChatGPT-3.5 but weaker than… https://t.co/Own8r9Umcn
Can't download @MistralAI's new 8x22B MoE, but managed to check some files! 1. Tokenizer identical to Mistral 7b 2. Mixtral (4096,14336) New (6144,16K), so larger base model used. 4. 16bit needs 258GB of VRAM. BnB 4bit 73GB. HQQ 4bit attention, 2 bit MLP 58GB VRAM => H100 fits!… https://t.co/NfGzzdOuIN https://t.co/AyWRNvVvYO