Jul 21, 05:38 PM

ARC Releases ARC-AGI-3 Preview With Six New Games; Frontier AI Including OpenAI’s o3 Scores 0%, Humans 100%

The AI research organization ARC has released a public preview of ARC-AGI-3, an interactive reasoning benchmark designed to challenge advanced artificial intelligence systems. This new version includes six novel games, with three games made available in the initial release. These games focus on interactive reasoning abilities, such as adaptive world modeling and agent-based problem solving, which differ from previous benchmarks that emphasized deep learning and static reasoning. Current frontier AI models, including leading large language models like OpenAI's o3, have scored 0% on these tasks, while humans consistently achieve 100%. The benchmark aims to provide a more rigorous measure of progress toward artificial general intelligence (AGI) by testing AI's ability to generalize in novel environments. The ARC team emphasizes the importance of maintaining rigorous and honest benchmarking standards, as some AI benchmark results have faced issues like saturation, contamination, and disputed answer keys. ARC-AGI-3 is part of ongoing efforts to push AI development beyond existing capabilities and better assess AI reasoning in dynamic, interactive contexts.

#ARC #OpenAI

Written with ChatGPT (GPT-4).