
Google, in collaboration with DeepMind and Stanford University, has announced the development of a new tool designed to enhance the fact-checking capabilities of large language models (LLMs). This tool, named the Search-Augmented Factuality Evaluator (SAFE), utilizes a large language model to dissect generated text into individual facts, which are then verified for accuracy using Google Search. The initiative aims to address the challenge of factual errors often found in content generated by LLMs in response to open-ended, fact-seeking prompts. By introducing SAFE, Google proposes that LLM agents can serve as automated evaluators for long-form factuality, demonstrating that these agents can achieve superhuman rating performance in fact-checking. The research also highlights that larger models tend to be more factual and that LLMs can be up to 20 times cheaper than human annotators. The introduction of SAFE is part of a broader effort to benchmark long-form factuality in open domains, providing a new dataset, evaluation method, and an aggregation metric that accounts for both precision and recall. This comprehensive approach also includes an analysis of thirteen popular LLMs, such as Gemini, GPT, Claude, and PaLM-2 models, aiming to create a realistic benchmark for evaluating long-form factuality and simulating daily queries about knowledge and truth. The dataset was generated with LLMs, and the autorater is an LLM agent with Google Search.
Researchers from Google DeepMind and Stanford Introduce Search-Augmented Factuality Evaluator (SAFE): Enhancing Factuality Evaluation in Large Language Models Quick read: https://t.co/anXisulDKY Researchers from Google DeepMind and Stanford University have introduced a novel…
People and companies lie about AI. https://t.co/CTFindvjC4
DeepMind Unveils SAFE: An AI-Powered Tool for Fact-Checking LLMs #accuracy #AI #artificialintelligence #ChatGPT #Collaboration #DeepMind #Factcheckers #factchecking #GoogleSearch #llm #machinelearning #Media #methodology #opensource #Reliability #Safe https://t.co/RQ7hednLqH https://t.co/k6RPJa9UKr


