




Discover new NVIDIA NIMs for Mistral-7B, Mixtral-8x7B, and Mixtral-8x22B designed to meet diverse business needs. Learn about the strengths of each foundation model for specific tasks. Read more > https://t.co/11VD9zfUHh https://t.co/OJHNX2Adw3
Pretty awesome - Finetuning NeMo 12B (from @MistralAI + @nvidia ) fits in 12GB of VRAM and is 2x faster, and uses 60% less VRAM, with no accuracy degradation and works for free in a Google Colab. with @UnslothAI lib image and colab from @danielhanchen google colab link in… https://t.co/GwrhIO3EuQ
Finetuning NeMo 12B (from @MistralAI + @nvidia ) fits in 12GB of VRAM is 2x faster and uses 60% less VRAM, with no accuracy degradation and works for free in a Google Colab. with @unsloth lib google colab link in comment https://t.co/YI6ZlVHTBU

Mistral NeMo, a new AI model with 12 billion parameters and a 128K context window, is now available on MLX. Users can run inference or train the model locally on a Mac using MLX by installing the necessary packages. The model supports function calling and can be installed via 'pip install -U fastmlx' and 'mlx-lm' from source (PR #895). Additionally, Mistral NeMo is integrated with NVIDIA's NIM inference microservice, offering performance-optimized inference with TensorRT-LLM engines. The model runs at 37 tokens per second on an M3 Max and can be fine-tuned using only 12GB of VRAM, making it twice as fast and 60% more memory-efficient without accuracy loss. Fine-tuning is also supported on Google Colab using the Unsloth library. The model, including variants like Mistral NeMo Instruct (Q4), Mistral-7B, Mixtral-8x7B, and Mixtral-8x22B, is designed to meet diverse business needs and can be run with minimal code.