DeepNewz, mobile.
People-sourced. AI-powered. Unbiased News.
Download on the App Store
Screenshot of DeepNewz app showing story detail view.
Jan 21, 05:39 AM
DeepSeek R1 Scores 57% on Aider Benchmark, Ranks Second to o1 at 62%, Features 'Deep Research'
AI Modeling
AI Products
AI

DeepSeek R1 Scores 57% on Aider Benchmark, Ranks Second to o1 at 62%, Features 'Deep Research'

Authors
  • gmoney.9dcc.e τh
  • Aravind Srinivas
  • Bindu Reddy
9

DeepSeek R1 has achieved a score of 57% on the Aider polyglot benchmark, ranking second behind o1, which scored 62%. Other competitors included Sonnet at 52% and DeepSeek Chat V3 at 48%. The leaderboard highlights the performance of these models in advanced reasoning and search capabilities. Users have noted that DeepSeek R1 excels in web searching, matching the performance of GPT-4o, and it features a 'Deep Research' capability that integrates search and reasoning, positioning it competitively against similar features from Gemini and Perplexity. Feedback from users suggests that DeepSeek R1 may have advantages over its competitors, particularly in accessing the web and handling complex queries, although some noted its tendency to produce unnecessary code outputs.

Written with ChatGPT (GPT-4o mini).

Additional media

Image #1 for story deepseek-r1-scores-57-on-aider-benchmark-ranks-second-to-o1-62-features-deep-7cdbf657
Image #2 for story deepseek-r1-scores-57-on-aider-benchmark-ranks-second-to-o1-62-features-deep-7cdbf657
Image #3 for story deepseek-r1-scores-57-on-aider-benchmark-ranks-second-to-o1-62-features-deep-7cdbf657
Image #4 for story deepseek-r1-scores-57-on-aider-benchmark-ranks-second-to-o1-62-features-deep-7cdbf657