OS-ATLAS is an open-source foundational model designed for graphical user interface (GUI) interactions, addressing the performance gap between open-source visual language models (VLMs) and proprietary models such as GPT-4V. It incorporates data and modeling innovations to enhance GUI scenarios. Additionally, AutoGLM focuses on autonomous foundation agents specifically for web browsers and Android platforms, emphasizing the need for an intermediate interface that separates planning and grounding behaviors in GUI agents. This approach has achieved state-of-the-art (SOTA) results in its application.
"OS-ATLAS: A Foundation Action Model For Generalist GUI Agents" https://t.co/DTp55ir6Ct
🏷️:OS-ATLAS: A Foundation Action Model for Generalist GUI Agents 🔗:https://t.co/VsEj2C6xuO https://t.co/hcogsg9f3Z
AutoGLM: Autonomous Foundation Agents for GUIs Focuses on Web Browser and Android as the representative GUI scenarios. Found it essential to design an intermediate interface that disentangles planning and grounding behaviors in foundation GUI agents. Achieves SOTA results in… https://t.co/xuaklITdDT