As LLMs grow bigger, they're more likely to give wrong answers than admit ignorance @nature https://t.co/R7Rf7BYVt2
Larger and more instructable language models become less reliable | Nature ht @lexin_zhou #LLMs https://t.co/X6yNUlOcWj
Amid all the chaos at OpenAI I hope this paper in Nature gets some attention. It finds that larger language models become *less* reliable for a variety of reasons—for one, where earlier models avoided questions they couldn't answer, newer ones are more likely to make something up https://t.co/FO5LajoAKx
A new study published in Nature reveals that larger and more instructable language models (LLMs) become less reliable. The research, involving authors such as Jose Hernandez-Orallo, Lexin Zhou, and Wout Schellaert, indicates that as LLMs like OpenAI's GPT, Meta's Llama, and BigScience's BLOOM are trained on more data, they are more likely to provide incorrect answers rather than admit ignorance. The study highlights three key elements: difficulty concordance, task avoidance, and prompting stability. It also notes that earlier models often avoided questions they couldn't answer, whereas newer models are more prone to bluffing answers. This discrepancy between human expectations of task difficulty and LLM errors poses significant challenges for the reliability of these advanced AI systems. Ilya Sutskever's 2022 prediction that this discrepancy would diminish over time has not materialized, and lower avoidance in newer models increases the likelihood of errors.