Feb 18, 04:00 AM

Microsoft Launches OmniParser V2: 60% Faster, Sub-Second Latency Tool for LLMs, Open-Source MIT License

Microsoft has launched OmniParser V2, an advanced screen parsing tool that enhances the capabilities of large language models (LLMs) by enabling them to interact with computer screens. The new version is reported to be 60% faster than its predecessor, OmniParser V1, achieving sub-second latency on high-performance graphics cards like the NVIDIA GeForce RTX 4090. OmniParser V2 is designed to convert UI screenshots into structured data, allowing models such as GPT-4, DeepSeek R1, and Sonnet 3.5 to understand and act upon the information displayed on screens. The tool is open-source and available under the MIT license, making it accessible for integration with various models and agents. Additionally, it supports multiple operating systems, including Windows, macOS, Android, iOS, and web applications, thereby broadening its applicability in web automation and AI-driven tasks.

#Microsoft #OmniParser V2 #OmniParser V1 #DeepSeek R1 #MIT #Windows #macOS #Android #iOS

Written with ChatGPT (GPT-4o mini).