Tsinghua University and Ant Group have introduced AReaL-boba², an asynchronous reinforcement learning (RL) system designed to enhance large language model (LLM) reasoning. The open-source framework offers scalability from single GPUs to large clusters and achieves approximately 2.77 to 2.8 times faster training speeds compared to synchronous RL baselines. AReaL-boba² supports native agentic RL and multi-turn reasoning, facilitating more efficient LLM training without extensive infrastructure changes. Concurrently, other research teams have advanced RL methodologies for LLMs, including the Writer team's GRPO approach that improves reasoning by rewarding self-reflection, and the R3-Retrieval-Augmented Generation method that combines RL with step-by-step information retrieval. NVIDIA has unveiled ProRL, a prolonged RL training technique that uncovers novel reasoning strategies beyond base models. Additional innovations include TW-GRPO, which enhances visual reasoning by focusing on important tokens and providing partial credit, and MoE-X, a Mixture-of-Experts model improving interpretability and perplexity over GPT-2. These developments collectively represent ongoing progress in accelerating and refining RL applications for LLM reasoning and generalization.
Developing open Large Reasoning Models relies heavily on existing closed ones, limiting independent research. Short Chain-of-Thought LLMs lack long reasoning capabilities needed. This paper creates a large dataset of long Chain-of-Thought reasoning using short Chain-of-Thought https://t.co/wnq2DGe1eW
AReaL-boba² just dropped from Tsinghua Univesity & Ant Group. An async RL framework built for LLM reasoning that runs on one desk-side GPU or a thousand without rewriting a line. They’re clocking ~2.8× speed over synchronous baselines, shipping 14 B/7 B checkpoints out of the https://t.co/AKebCWT9lX
Neurons in LLMs encode multiple concepts, obscuring understanding. This paper introduces MoE-X, a Mixture-of-Experts model designed for intrinsic interpretability. It achieves better perplexity than GPT-2 and surpasses sparse autoencoders in interpretability. Methods 🔧: → https://t.co/AceMaaDqCf