Sep 26, 02:11 PM

Nature Study: Larger Language Models Like GPT, Llama, BLOOM Become Less Reliable

A new study published in Nature reveals that larger and more instructable language models (LLMs) become less reliable. The research, involving authors such as Jose Hernandez-Orallo, Lexin Zhou, and Wout Schellaert, indicates that as LLMs like OpenAI's GPT, Meta's Llama, and BigScience's BLOOM are trained on more data, they are more likely to provide incorrect answers rather than admit ignorance. The study highlights three key elements: difficulty concordance, task avoidance, and prompting stability. It also notes that earlier models often avoided questions they couldn't answer, whereas newer models are more prone to bluffing answers. This discrepancy between human expectations of task difficulty and LLM errors poses significant challenges for the reliability of these advanced AI systems. Ilya Sutskever's 2022 prediction that this discrepancy would diminish over time has not materialized, and lower avoidance in newer models increases the likelihood of errors.

#Nature #Lexin Zhou #Wout Schellaert #OpenAI #GPT #Llama #BigScience #BLOOM #Ilya Sutskever

Written with ChatGPT (GPT-4o).

Sources

Additional media

Image #1 for story nature-study-larger-language-models-like-gpt-llama-bloom-less-reliable-acb2a59f

Nature Study: Larger Language Models Like GPT, Llama, BLOOM Become Less Reliable

Sources

Additional media

Similar Stories

Similar Stories