
Google, in collaboration with DeepMind and Stanford, introduces a new method called Search-Augmented Factuality Evaluator (SAFE) to evaluate long-form factuality in large language models (LLMs). The method combines LLM agents with Google Search to verify claims, showing that LLMs can rate themselves better than humans and achieve superhuman rating performance. The research highlights that bigger models are more factual and LLMs are 20 times cheaper than humans for fact checking.



"Long-Form Factuality in Large Language Models" introduces a new approach to evaluating and benchmarking the factuality of long-form responses generated by large language models (LLMs). Key contributions: https://t.co/61SPVtboDN
Researchers from Google DeepMind and Stanford Introduce Search-Augmented Factuality Evaluator (SAFE): Enhancing Factuality Evaluation in Large Language Models Quick read: https://t.co/anXisulDKY Researchers from Google DeepMind and Stanford University have introduced a novel…
People and companies lie about AI. https://t.co/CTFindvjC4