Researchers from UCLA and Meta AI have introduced d1, a novel framework designed to enhance reasoning capabilities in diffusion-based large language models (LLMs) through a combination of supervised fine-tuning (SFT) and reinforcement learning (RL). The approach involves a two-stage pipeline where masked diffusion LLMs are first fine-tuned on a dataset of 1,000 examples, followed by a reinforcement learning phase using a critic-free policy-gradient method called diffu-GRPO. This method allows the models to perform step-by-step reasoning in a single forward pass, improving their ability to generate accurate responses without relying on extensive Monte Carlo simulations. Concurrently, research on retrieval-augmented generation (RAG) highlights its core simplicity in identifying relevant context, retrieving information, and generating responses, with developments in agentic RAG pipelines adding query analysis and reranking to optimize results. Additional studies focus on the expansion of LLM output lengths, multilingual reasoning capabilities, and the transfer of reasoning skills from large to smaller models through knowledge distillation and chain-of-thought fine-tuning. These advancements collectively aim to improve the reasoning accuracy and efficiency of LLMs across various applications and languages.
This survey paper explores efficient Small Reasoning Models (SRMs, <10B parameters). 📌 Knowledge distillation effectively transfers complex reasoning from Large Models to efficient Small Reasoning Models. 📌 Combining Chain-of-Thought Supervised Fine-Tuning and Reinforcement https://t.co/I6OSjkjKcl
[CL] Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time W Yang, X Yue, V Chaudhary, X Han [Case Western Reserve University & CMU] (2025) https://t.co/zlf2DdpkKI https://t.co/Op7GYyG7R5
[LG] Sleep-time Compute: Beyond Inference Scaling at Test-time K Lin, C Snell, Y Wang, C Packer... [Letta & UC Berkeley] (2025) https://t.co/An2IQ3trRG https://t.co/Sl7eXOPmrl