Researchers at New York University (NYU) conducted an independent study on the ARC-AGI tasks, revealing that 98.7% of the public tasks are solvable by humans. The study found that 790 out of 800 tasks could be completed by at least one Mechanical Turk worker. This finding underscores the gap between human and AI performance on these tasks. The ARC-AGI competition, which challenges participants to develop AI capable of solving these tasks, will end on November 10, 2024. Researchers aim for future iterations to achieve 100% solvability and to establish human baselines on the private test set. Many high-scoring entries in the competition currently rely on basic brute-force program search.
One inspiration for ARC-AGI solutions is the psychology how humans solve novel tasks. A new study by @todd_gureckis @LakeBrenden @solimlegris @wkvong @ NYU explores human performance on ARC, finding that 98.7% of the public tasks are solvable by at least 1 MTurker. https://t.co/Yf0sKTRnXf
New ARC-AGI human study from NYU, we now have more direct evidence that "all ARC-AGI tasks can be solved by humans". Still imperfect though and only measured on the public test set. For v2 we're targeting 100% solvability and getting human baselines on the private test set. https://t.co/AVi6BZyQa0
Researchers at NYU did a study on whether MTurkers could solve ARC-AGI tasks. They found that 790/800 (98.7%) of the public tasks are solvable by at least one MTurker (each task was seen by about 10 people): https://t.co/5PmCJT778k As a reminder, the private test set was…