Researchers from Tsinghua University, Beijing Institute for General Artificial Intelligence, and Pennsylvania State University have introduced a new reinforcement learning with reinforcement (RLVR) paradigm called Absolute Zero Reasoner (AZR). This approach enables large language models (LLMs) to teach themselves reasoning skills without relying on any human-labeled data. AZR operates through a self-play mechanism where the model generates its own coding puzzles and then solves and grades them autonomously using Python. Despite starting from a blank task set, AZR outperforms models trained on tens to hundreds of thousands of labeled examples, suggesting the potential to enhance the accuracy and capabilities of generative AI over time. Experts, including researchers affiliated with OpenAI, anticipate that such reasoning models could lead to new scientific discoveries and mark a shift in AI training paradigms.
Tsinghua University’s Absolute Zero: Self-Training LLMs Without External Data #ArtificialIntelligence #MachineLearning #LargeLanguageModels #TsinghuaUniversity #SelfTrainingAI https://t.co/bhhmxfvq3E https://t.co/6HTkO9prcM
[ Meta‑Agentic α‑AGI 👁️✨ Demo v3 — AZR‑Powered “Alpha‑Factory v1” ] Absolute Zero Reasoner (AZR) self‑curriculum — a reinforced self‑play engine that perpetually invents and solves its own tasks, unlocking open‑ended cross‑domain reasoning. GtHub : https://t.co/bZDE01TS6h https://t.co/sMRgxiBl8u
AI That Teaches Itself: Tsinghua University’s ‘Absolute Zero’ Trains LLMs With Zero External Data Researchers from Tsinghua University, Beijing Institute for General Artificial Intelligence, and Pennsylvania State University have proposed an RLVR paradigm called Absolute Zero to https://t.co/ghgKHAQ8yh