May 9, 10:49 AM

OpenAI’s GPT o3 and o4-mini Models Show 33% and 48% Hallucination Rates Amid New Usage Guide and Safety Concerns

OpenAI's latest testing has revealed that its new reasoning AI models, GPT o3 and GPT o4-mini, exhibit high hallucination rates of 33% and 48%, respectively. These models, designed to simulate human-like thought processes, have instead produced a notable amount of inaccurate or fabricated responses. In response, OpenAI has released an official guide detailing the appropriate use cases for its six AI models, including GPT-4o for multimodal everyday tasks, GPT-4.5 for creative tasks, o4-mini for fast reasoning, o4-mini high for more technical reasoning, o3 for long multi-step tasks, and o1 pro for complex analytical tasks. The increased hallucination rates in these models have raised concerns among developers, as reflected in a recent survey indicating that the o3 model is more hallucinatory by over 6% compared to previous versions. Experts emphasize the importance of developing AI models that do not simply imitate human behavior to improve trustworthiness, honesty, and transparency. Notably, AI ethics discussions highlight challenges with current models trained to please users, which may reinforce biases rather than provide truthful responses. AI researcher Yoshua Bengio has advocated for a new approach called "Scientist AI," aiming for safer and more controlled AI development, contrasting with the current trajectory of agency-driven models.

#OpenAI #Yoshua Bengio

Written with ChatGPT (GPT-4).