Researchers from Tsinghua University and Shanghai AI Lab have introduced a new AI framework called Diagram of Thought (DoT), which models iterative reasoning in large language models (LLMs) as the construction of a directed acyclic graph (DAG) within a single model. This development aims to enhance the reasoning capabilities of LLMs through mathematical rigor. Additionally, Google DeepMind has developed a multi-turn chain-of-thought online reinforcement learning approach called SCoRe to improve self-correction in LLMs using self-generated data. SCoRe has achieved state-of-the-art performance, improving metrics such as MATH by 15.6% and HumanEval by 9.1%. Another novel approach, Iteration of Thought (IoT), leverages inner dialogue for autonomous reasoning, while the Hidden Chain-of-Thought (HCoT) framework speeds up LLM inference while preserving multi-step reasoning capabilities. These advancements highlight the growing focus on improving reasoning and self-correction capabilities in LLMs.
.@GoogleDeepMind introduced the first method to highly improve LLMs' self-correction - SCoRe. What's its secret? Structured Reinforcement Learning (expectable, isn't it?). It can make LLMs better at fixing their mistakes in real time. SCoRe uses training in 2 stages: 🧵 https://t.co/DFIBQBIERY
This paper from @GoogleDeepMind proposes a new way to train LLM for self-correction WITHOUT relying on a more capable model or other forms of supervision. Instead it does it with Self-generated data. ---- Generated this podcast with @Google 's NotebookLM Earlier generated… https://t.co/KsXlO5vImS https://t.co/k5AakQBH1F
Back to basics! A thread about Chain-of-Thought (CoT) 🧵 https://t.co/H5F2fmuFNP