Anthropic has released Claude 3.7 Sonnet, which features hybrid reasoning, enhanced coding capabilities, and the ability to output 128K tokens. This new model reportedly surpasses OpenAI's GPT-4o in coding tasks and demonstrates advanced skills in playing Pokémon Red. In a related development, researchers at Hao AI Lab have begun benchmarking various AI models, including Claude 3.7, Claude 3.5, Gemini 1.5 Pro, and GPT-4o, using the Super Mario Bros. game. The tests aim to evaluate the models' performance in real-time scenarios, revealing notable differences in their reasoning and speed. The benchmarking initiative has sparked interest in how different AI systems handle the challenges presented by classic video games.
🚨 Researchers Are Using Super Mario to Benchmark AI Models Last week, Anthropic showcased Claude 3.7's Pokémon-playing abilities. But researchers argue that Super Mario Bros. is even tougher. Hao AI Lab tested Claude 3.7, Claude 3.5, Gemini 1.5 Pro, and GPT-4o, on their… https://t.co/PmES75iIB5
Have you ever wondered who'd win in a debate — @OpenAI GPT-4o or @AnthropicAI Claude 3.7 Sonnet? Wonder no more... https://t.co/hLTv2N5wp6 https://t.co/6tfgAOCtHi
Claude 3.7 Sonnet vs GPT-4o in a debate. https://t.co/6tfgAOCtHi