Nov 1, 02:50 PM

OpenHands CodeAct 2.1 Achieves 53% Resolve Rate on SWE-Bench, 41.7% on SWE-Bench Lite, Surpassing Claude 3.5's 49%

Recent advancements in AI software development agents have led to significant improvements in performance metrics on the SWE-bench. The OpenHands CodeAct 2.1, developed by All Hands AI, has achieved a state-of-the-art resolve rate of 53% on SWE-Bench Verified and 41.7% on SWE-Bench Lite. This new agent surpasses the previous record of 49% set by Anthropic's Claude 3.5 Sonnet. The enhancements in OpenHands CodeAct 2.1 are attributed to its use of function calling and the integration of Claude 3.5. In comparison, other models such as GPT o1-preview and 4-o recorded resolve rates of 38.4% and 33.2%, respectively. The rapid progress in AI coding agents highlights the ongoing evolution in this field, with the SWE-bench serving as a benchmark for evaluating their effectiveness in solving real GitHub issues.

#OpenHands CodeAct #All Hands AI #Anthropic #Claude #GitHub

Written with ChatGPT (GPT-4o mini).

Sources

Additional media

Image #1 for story openhands-codeact-2-1-achieves-53-resolve-rate-on-swe-bench-41-7-on-swe-bench-3-cb65f5c2

Image #2 for story openhands-codeact-2-1-achieves-53-resolve-rate-on-swe-bench-41-7-on-swe-bench-3-cb65f5c2

OpenHands CodeAct 2.1 Achieves 53% Resolve Rate on SWE-Bench, 41.7% on SWE-Bench Lite, Surpassing Claude 3.5's 49%

Sources

Additional media

Similar Stories