A new benchmark, WorkArena++, has been introduced to evaluate the performance of WebAgents. The best-performing agent on this benchmark achieved 0% accuracy, while human evaluators obtained 94%. In related developments, a new technique called Agent Workflow Memory (AWM) has been introduced to improve the performance of LLM-based models on long-horizon tasks with complex action trajectories. AWM allows agents to store and reuse workflows from past experiences, leading to significant improvements. Specifically, AWM improves the relative success rate by 24.6% and 51.1% on Mind2Web and WebArena tasks, respectively, while reducing the number of steps required to complete these tasks. For more details, refer to the repo and abs.
LLM-based models still struggle with long-horizon tasks with complex action trajectories. This paper introduces Agent Workflow Memory to induce commonly reused workflows and provide these to the agent on demand. Works offline and online and is meant to guide the agent's… https://t.co/UQ0aM7C6WN
Check out new work on allowing agents to learn and improve from their experiences: Agent Workflow Memory. Every time an agent predicts that a task has been successfully completed, it stores "workflows" in its memory to use in future tasks. Great results on web browsing tasks! https://t.co/T5yFwVYvUm
How can we create AI agents that continually improve, learning from past successes? Presenting 🌟Agent Workflow Memory🌟, which allows agents to induce, learn, and use task workflows from experiences on the fly🪽 Adding AWM to a strong agent improves accuracy by 51.1% on… https://t.co/lMh9imOaP1