
Google and researchers from Google DeepMind and Stanford have introduced a new approach for evaluating long-form factuality in large language models (LLMs). The method, called Search-Augmented Factuality Evaluator (SAFE), utilizes LLM agents with Google Search to verify individual claims in responses. This new tool aims to enhance factuality evaluation and shows that LLMs can rate themselves better than humans, achieving superhuman rating performance and being more cost-effective.



"Long-Form Factuality in Large Language Models" introduces a new approach to evaluating and benchmarking the factuality of long-form responses generated by large language models (LLMs). Key contributions: https://t.co/61SPVtboDN
Researchers from Google DeepMind and Stanford Introduce Search-Augmented Factuality Evaluator (SAFE): Enhancing Factuality Evaluation in Large Language Models Quick read: https://t.co/anXisulDKY Researchers from Google DeepMind and Stanford University have introduced a novel…
People and companies lie about AI. https://t.co/CTFindvjC4