Recent advancements in artificial intelligence have led to the development of state-of-the-art quantized reasoning models based on the DeepSeek-R1-Distill suite. An experimental model, referred to as the 3.8-billion-parameter version, has demonstrated reasoning performance comparable to or exceeding that of larger models, such as DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Llama-8B. These models utilize FP8 and INT8 quantization techniques, achieving near-perfect accuracy recovery across various reasoning benchmarks. Additionally, the Tiny-R1-32B-Preview model has been released, which outperforms the Deepseek-R1-Distill-70B model and nearly matches the full R1 model in mathematical reasoning. This new model was developed by researchers from Peking University and Qihoo 360, and it is expected to release training and evaluation code soon. The focus of these innovations is to enhance the efficiency of reasoning models, particularly in scaling inference compute while minimizing performance loss.
Problem: Scaling test-time compute for LLM reasoning is limited by Transformer inefficiency. Solution: Distilling Mamba models enables faster inference to surpass Transformer reasoning under fixed budgets. 📌 Distillation effectively transfers Transformer reasoning to faster… https://t.co/Pd7eoRxdiR
[CL] Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners D Paliotta, J Wang, M Pagliardini, K Y. Li... [University of Geneva & Together AI & EPFL] (2025) https://t.co/M6E9iOWDvn https://t.co/XmAyYMTiXJ
Tiny-R1-32B-Preview 🔥 reasoning model that outperforms Deepseek-R1-Distill-70B and nearly matches the full R1 in math, released by @PKU1898 & @QIHU_Official https://t.co/M40UjC5RPt ✨ Built with SFT + R1-generated responses ✨ Will release training and evaluation code, selected… https://t.co/JO6Ci99CIe