Jun 14, 08:34 AM

Apple Study Questioning AI Reasoning Faces Rebuttals Blaming Token-Limit Artifacts

Apple researchers released a paper in early June titled “The Illusion of Thinking,” contending that leading large language models—including OpenAI’s o1 and o3, DeepSeek-R1 and Claude 3.7 Sonnet—solve complex problems mainly through pattern matching rather than genuine reasoning. The study reported a sharp drop-off in accuracy on multi-step tasks, arguing that current evaluation benchmarks overstate machine reasoning capabilities. Within days, multiple teams issued rebuttals. A response paper, “The Illusion of the Illusion of Thinking,” co-authored by Anthropic’s Claude Opus, says Apple’s experiments misattributed errors that stemmed from token-limit truncation, rigid scoring methods and the inclusion of unsolvable puzzles. The authors claim that when output length constraints are removed or answers are compressed, model performance rebounds, undermining Apple’s conclusion that reasoning collapses. Other researchers echoed the criticism, while some analysts—including AI commentator Gary Marcus—found the counter-arguments unconvincing and noted mathematical mistakes in several follow-up studies. The episode has opened a broader debate over how to design tests that distinguish genuine reasoning from statistical pattern matching, highlighting the unsettled state of measuring cognitive capabilities in rapidly advancing AI systems.

#Apple #The Illusion of Thinking #OpenAI #Anthropic #Claude Opus #Gary Marcus

Written with ChatGPT .