Jun 17, 07:13 PM

ARC-AGI Benchmark Sees Progress with 71% Accuracy Achieved by Ryan Greenblatt, $1 Million Prize

The ARC-AGI benchmark has garnered significant attention recently as a challenging problem for large language models (LLMs) to solve. Ryan Greenblatt achieved 71% accuracy on a set of examples where humans typically achieve 85%, marking a state-of-the-art (SOTA) performance. The benchmark, which offers a $1 million prize, involves generating many possible Python programs to implement transformations, using a carefully-crafted few-shot prompt, generating ~5k guesses, and selecting the best ones using examples and a debugging step. Some experts argue that solving ARC-AGI does not equate to achieving artificial general intelligence (AGI) but recognize it as a valuable challenge highlighting LLMs' weaknesses in cell-based rules like the Game of Life. Another attempt using GPT-4o reached 50% accuracy, demonstrating progress through clever tricks and increased computational search.

#Ryan Greenblatt #Python #Game of Life

Written with ChatGPT (GPT-4o).

Sources

Additional media

Image #1 for story arc-agi-benchmark-sees-progress-71-accuracy-achieved-ryan-greenblatt-1-million

ARC-AGI Benchmark Sees Progress with 71% Accuracy Achieved by Ryan Greenblatt, $1 Million Prize

Sources

Additional media

Similar Stories