
Google and its DeepMind division, in collaboration with Stanford University, have introduced a novel approach to enhancing the factuality of responses generated by large language models (LLMs). This initiative, named the Search-Augmented Factuality Evaluator (SAFE), aims to use LLMs as automated evaluators for long-form factuality. SAFE combines Google Search and LLM queries to extract and verify individual claims in responses, demonstrating that LLM agents can achieve superhuman rating performance in fact-checking when given access to Google. The research also highlights that larger models tend to be more factual and that employing LLMs for fact-checking is significantly cheaper, being 20x cheaper than human annotators. The team has provided a comprehensive evaluation pipeline, including a new dataset and an autorater, showcasing that LLMs can effectively rate themselves better than humans. This development is particularly timely, given the increasing concern over the factual accuracy of information generated by AI systems.



Researchers from Google DeepMind and Stanford Introduce Search-Augmented Factuality Evaluator (SAFE): Enhancing Factuality Evaluation in Large Language Models Quick read: https://t.co/anXisulDKY Researchers from Google DeepMind and Stanford University have introduced a novel…
People and companies lie about AI. https://t.co/CTFindvjC4
DeepMind Unveils SAFE: An AI-Powered Tool for Fact-Checking LLMs #accuracy #AI #artificialintelligence #ChatGPT #Collaboration #DeepMind #Factcheckers #factchecking #GoogleSearch #llm #machinelearning #Media #methodology #opensource #Reliability #Safe https://t.co/RQ7hednLqH https://t.co/k6RPJa9UKr