Aug 28, 01:59 PM

Sapient’s 27-Million-Parameter AI Surpasses ChatGPT in Key Reasoning Test

Singapore-based AI firm Sapient has unveiled a Hierarchical Reasoning Model (HRM) that it says can out-perform far larger language models on demanding reasoning benchmarks, underscoring a shift toward efficiency-oriented architectures. Despite containing just 27 million parameters and being trained on roughly 1,000 examples, HRM scored 40.3% on the ARC-AGI-1 benchmark and 5% on the more difficult ARC-AGI-2 test. The results top OpenAI’s 03-mini-high, which posted 34.5% and 3% respectively, and exceed scores reported for Anthropic’s Claude 3.7 and DeepSeek R1—systems that rely on billions of parameters. The model splits computation between a slow, high-level planning module and a fast, low-level execution module, mirroring the brain’s multi-timescale processing. A pre-print on arXiv says this design allows HRM to solve structured tasks such as complex Sudoku puzzles and maze navigation in a single forward pass. The ARC-AGI benchmark team reproduced the headline results and noted that an iterative refinement step during training contributed materially to the gains. Sapient’s announcement came a day after U.S.-based Nous Research released “Hermes 4,” an open-weight family of Llama 3.1-based models sized at 14 billion, 70 billion and 405 billion parameters. Hermes 4 introduces a tag that lets users toggle between rapid replies and deeper reasoning, employs a dataset 50 times larger than its predecessor and incorporates an anti-sycophancy bias aimed at reducing flattering responses. The back-to-back releases highlight intensifying efforts to enhance AI reasoning through architectural innovation rather than ever-growing scale, offering a potential counterweight to the parameter-heavy approach embodied by models such as GPT-5.