Jun 5, 08:05 AM

Tsinghua University and Ant Group Release Open-Source AReaL-boba² Async RL System With 2.77× Faster Training for Scalable LLM Reasoning

Tsinghua University and Ant Group have introduced AReaL-boba², an asynchronous reinforcement learning (RL) system designed to enhance large language model (LLM) reasoning. The open-source framework offers scalability from single GPUs to large clusters and achieves approximately 2.77 to 2.8 times faster training speeds compared to synchronous RL baselines. AReaL-boba² supports native agentic RL and multi-turn reasoning, facilitating more efficient LLM training without extensive infrastructure changes. Concurrently, other research teams have advanced RL methodologies for LLMs, including the Writer team's GRPO approach that improves reasoning by rewarding self-reflection, and the R3-Retrieval-Augmented Generation method that combines RL with step-by-step information retrieval. NVIDIA has unveiled ProRL, a prolonged RL training technique that uncovers novel reasoning strategies beyond base models. Additional innovations include TW-GRPO, which enhances visual reasoning by focusing on important tokens and providing partial credit, and MoE-X, a Mixture-of-Experts model improving interpretability and perplexity over GPT-2. These developments collectively represent ongoing progress in accelerating and refining RL applications for LLM reasoning and generalization.

#Tsinghua University #Ant Group #Writer #NVIDIA #ProRL #GPT

Written with ChatGPT (GPT-4).