Claude Sonnet 3.7 has demonstrated improved performance with its extended thinking mode, recently climbing to second place in testing rankings. Tests were conducted using different token limits, including 20K and 40K, with no discernible difference in results. Performance metrics indicate that the base model achieved a success rate of 12.2% at a cost of $0.05 per task, while the extended thinking modes showed varied results: 11.6% at 1K tokens ($0.07/task), 21% at 8K tokens ($0.21/task), and 28.6% at 16K tokens ($0.33/task). The performance of Claude Sonnet 3.7 is reported to be comparable to the o3-mini model, albeit with a slightly higher cost per task. Additionally, the extended thinking feature will be included in the next update of the Live AI Assistant, which will provide insights into its thinking process. The extended thinking mode supports an output limit of 128K tokens, although longer thinking runs may require several minutes to complete.
A look at Claude 3.7 Sonnet's extended thinking mode and its 128K token output limit; long thinking runs are impressive but can take several minutes to complete (@simonw / Simon Willison's Weblog) https://t.co/2H7GZQqexJ https://t.co/d6p4eLpGkH https://t.co/ZOzeer1FAj
Claude Sonnet 3.7 Extended Thinking is great, it will be supported in the next update of Live AI Assistant, with an overview on its "thinking process" https://t.co/JbAkCgoJPa
We just tested @arcprize on Claude 3.7 Few thoughts: * Performance is on par with o3-mini for slightly increased cost * Scaling curve for thinking is convex at low thinking, this would likely change at higher thinking tokens * Perf with the base model is on par with R1 https://t.co/SSKRwReG1H