Really fantastic research by @GoogleDeepMind 🧠 SCoRe: Multi-turn chain-of-thought online RL for LLM self-correction • SCoRe + inference-time scaling (maj@32): 10.5% improvement 🎯 Key points: • Uses self-generated data • Improves MATH by 15.6%, HumanEval by 9.1% • Single… https://t.co/k5AakQBH1F
Google DeepMind Introduced Self-Correction via Reinforcement Learning (SCoRe): A New AI Method Enhancing Large Language Models’ Accuracy in Complex Mathematical and Coding Tasks https://t.co/kmYHkgGaD1 #AI #SCoRe #LargeLanguageModels #ReinforcementLearning #SelfCorrection #ai… https://t.co/vQ4hACnWbP
1/n SCoRe: A Paradigm Shift in AI - Teaching Machines to Self-Correct Without External Guidance. Large language models (LLMs) have demonstrated remarkable capabilities in various domains, but their ability to self-correct their own mistakes remains a significant challenge. This… https://t.co/B7NJ2KWEXT
Google DeepMind has introduced a new method called SCoRe, which stands for Self-Correction via Reinforcement Learning, aimed at enhancing the self-correction abilities of large language models (LLMs). This multi-turn chain-of-thought online reinforcement learning approach uses entirely self-generated data to improve the accuracy of LLMs in complex mathematical and coding tasks. SCoRe has demonstrated significant improvements, including a 15.6% gain in self-correction for reasoning problems from the MATH dataset and a 9.1% improvement in HumanEval. Additionally, SCoRe combined with inference-time scaling (maj@32) achieves a 10.5% improvement. The method represents a significant advancement in AI, addressing the challenge of self-correction without external guidance.