On-Device #AgentCPM-GUI is Now Open-Source Built on #MiniCPM-V with 8 billion parameters, it accepts smartphone screenshots as input and autonomously executes user-specified tasks. 🎯 Key Features: - First open-source GUI agent fine-tuned for Chinese apps - RFT-enhanced https://t.co/0Lqoe8DCjg
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient task execution. → It processes smartphone screenshots as input to autonomously execute user-specified tasks on mobile applications. → https://t.co/491XzRqArj
清华和面壁智能一起开源了一个安卓的界面操作 Agent 首个针对中文APP精细优化的开源GUI Agent 覆盖高德地图、大众点评、哔哩哔哩、小红书等30余个主流中文APP; 平均动作长度压缩至9.7个token,提升端侧推理的效率。 https://t.co/xLEu5vdU8h https://t.co/FMYeiVgoFl
OpenBMB and collaborators, including Tsinghua University and Mianbi Intelligence, have released AgentCPM-GUI, an open-source on-device graphical user interface (GUI) agent designed to operate Android applications autonomously. Built on the MiniCPM-V model with 8 billion parameters, AgentCPM-GUI processes smartphone screenshots as input to execute user-specified tasks efficiently. It is the first open-source GUI agent fine-tuned specifically for Chinese apps, supporting over 30 mainstream applications such as Gaode Map, Dazhong Dianping, Bilibili, and Xiaohongshu. The agent features strong bilingual grounding based on an Android dataset and employs reinforcement fine-tuning (RFT) to enhance planning and reasoning capabilities. Additionally, it optimizes task execution by compressing the average action length to 9.7 tokens, improving inference efficiency on mobile devices. The development emphasizes on-device operation for greater trustworthiness compared to remote servers.