The AI landscape sees significant advancements with the introduction of new autonomous agents. The SWE-agent, an open-source LLM-based system, solves 12.5% of software engineering issues on the SWE-bench and achieves a pass@6 rate of 32.67% on SWE-bench Lite. Meanwhile, OpenDevin has launched two versions: the fully open-source Devin, capable of executing complex engineering tasks, and OpenDevin CodeAct 1.0, which boasts a 21% unassisted resolve rate on SWE-Bench Lite, marking a 17% improvement over the previous state-of-the-art by SWE-Agent.
🤔 How do we build an environment to evaluate AI coding Agents like Devin, SWE-agent, and Open-Devin? Check out 🤖R2E: Converting any Github Repository into a Programming Agent Environment, we can even use LLMs to optimize the speed of the code! 🚀 Want to use R2E to evaluate… https://t.co/JiX1IdbybC https://t.co/G2OMGxCwES
Nice work! 21% on SWE bench from OpenDevin https://t.co/dfsgoZCRXT
Exciting news! We released a new agent OpenDevin CodeAct 1.0, with state-of-the-art scores on SWE-Bench (Lite). See @xingyaow_'s thread, or our blog for details: https://t.co/UMap9ey0kt This is the new default agent in OpenDevin, happy coding! https://t.co/QvojwdaiPP https://t.co/2Gz6N0UoNi https://t.co/HNTKldTB7S