In an era where inference speed yields more reinforcement learning, which yields more AI, we feel hybrid SSM-transformer models have some compelling advantages. Nemotron-H-47B-Reasoning-128k is a bit more accurate than Llama-Nemotron-Super-49B-1.0 across all benchmarks, but it https://t.co/ZSVFMs18Kz
Transformers are still dominating the LLM scene but we show that higher throughput alternatives exist which are just as strong! Grateful to have a part in Nemotron-H Reasoning effort. 🙏 Technical report will be out soon, stay tuned! https://t.co/FWncfFGYkH
👀 Nemotron-H tackles large-scale reasoning while maintaining speed -- with 4x the throughput of comparable transformer models.⚡ See how #NVIDIAResearch accomplished this using a hybrid Mamba-Transformer architecture, and model fine-tuning ➡️ https://t.co/AuHYANG9gX https://t.co/9uUwiB8ejp
Nvidia recently released Nemotron, a hybrid Mamba-Transformer architecture model designed for large-scale reasoning with four times the throughput of comparable transformer models. The Nemotron-H-47B-Reasoning-128k variant demonstrates slightly higher accuracy than the Llama-Nemotron-Super-49B-1.0 across various benchmarks. Meanwhile, the OpenThinker team launched OpenThinker3, a 7 billion parameter (7B) model trained solely with supervised fine-tuning (SFT) on 1.2 million reasoning traces covering math, coding, and science domains. OpenThinker3 reportedly outperforms all open 7B and 8B models, including those trained with reinforcement learning, and surpasses Nvidia's Nemotron as well as GPT-4.1 in reasoning tasks. OpenThinker3 is available for local deployment via Hugging Face and LocalAI. Additionally, LocalAI announced the release of Ultravox, a multimodal speech large language model (LLM) based on Llama 3.2, and a variant combining Llama 3.1 with Whisper for speech and text input. Another new model, nbeerbower_qwen3-gutenberg-encore-14B, a Qwen3 fine-tuned for text generation, was also introduced on LocalAI. The developments highlight ongoing advancements in open-source and hybrid AI models focusing on reasoning capabilities and inference speed.