Researchers from the University of Waterloo, Carnegie Mellon University (CMU), and the Vector Institute have introduced a novel AI approach called Critique Fine-Tuning (CFT). This method aims to enhance the reasoning capabilities of large language models (LLMs) through structured critique learning. The initiative includes the creation of WebInstruct-CFT, which comprises 600,000 instruction-critique pairs, with a focus on 65% math-related content, as well as business and sciences. The critiques are generated using GPT-4, and the dataset is available in three sizes: 4K, 50K, and 600K examples. Additionally, the researchers have developed a benchmark named RealCritic to evaluate the quality of LLM critiques based on their ability to improve solutions, rather than solely on verdict accuracy. The study also addresses the challenge of aligning LLM judgments with human evaluations through various automatic prompt optimization techniques.
Ensuring that an LLM judge aligns with human judgment is a critical challenge for evaluation. We explored various automatic prompt optimization techniques to achieve this and, in the process, gained valuable insights. Here is a summary of these techniques. 👉🏽 At its core,… https://t.co/1dkAanSJlU
LLM Critique quality measured by correction success, not just verdict accuracy. This paper introduces RealCritic, a new benchmark to evaluate LLM critique quality by assessing how well critiques improve solutions, using a closed-loop approach and considering self, cross, and… https://t.co/DNOu7D36tp
WebInstruct-CFT: Teaching LLMs to critique - 600K instruction-critique pairs - 65% math, plus business & sciences - GPT-4 generated detailed critiques - 3 sizes: 4K/50K/600K examples https://t.co/jBDuPG1caP