The paper presents AppWorld, a comprehensive benchmarking environment for interactive coding agents, featuring a diverse set of tasks across multiple apps and APIs, revealing significant challenges for current models like GPT-4o. https://t.co/klujJPVlUu
Want to build or test Interactive Coding Agents? Check out AppWorld, an exciting new multi-app simulated environment and benchmark from @stonybrooku and @allen_ai ! https://t.co/DPL3MxuwqX
Can your AI return your online orders📦or cancel work meetings based on your emails✉️? That's what AppWorld is here to help determine! This simulated world of apps and people will help researchers benchmark interactive coding agents for day-to-day digital tasks. Visit the site:… https://t.co/yDEw7gtLu8

A new open-source AI framework called Odyssey has been introduced, designed to empower large language model (LLM)-based agents with open-world skills for exploring the Minecraft environment. This framework aims to enhance the capabilities of AI agents in autonomous exploration and task execution. Additionally, AppWorld has been launched as a simulated environment for benchmarking interactive coding agents. Developed by researchers at Stony Brook University and Allen AI, AppWorld allows AI to perform tasks such as managing online orders and scheduling meetings based on email interactions. The platform presents a diverse set of challenges for existing models, including GPT-4o, and aims to improve the interaction between AI agents and various applications.