Nous Research has launched Atropos, a new reinforcement learning (RL) environments framework designed to enhance the training of large language models (LLMs). Atropos facilitates scalable and distributed RL systems that enable models to improve reasoning and alignment through trial-and-error interactions, distinguishing RL from traditional fine-tuning methods. The framework has demonstrated notable performance improvements, including a fivefold increase on the Berkeley AI function calling benchmark with a specialized tool-calling model. The release marks a key development in advancing RL techniques for AI, supported by contributions from academic institutions such as Fudan University, Northwestern University, UC Berkeley, Carnegie Mellon University, Princeton University, New York University, Stanford University, and The University of Hong Kong in related machine learning research.
[CL] SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning J Chen, B Zhang, R Ma, P Wang... [The University of Hong Kong & Tencent] (2025) https://t.co/TqeQ9c9OEg https://t.co/ixjiAmPsa8
[LG] Accelerating Mixture-of-Experts Training with Adaptive Expert Replication A Skiadopoulos, M Zhao, S Gandhi, T Norrie... [Stanford University] (2025) https://t.co/pYngupwZRK https://t.co/9FTfABmzUT
[LG] Emergence and scaling laws in SGD learning of shallow neural networks Y Ren, E Nichani, D Wu, J D. Lee [Princeton University & New York University] (2025) https://t.co/ZdpGYJtQBd https://t.co/V78rZW0dzb