Open AI 开源了一个针对浏览器 Agents 的测试基准 BrowseComp 旨在测试人工智能代理浏览互联网以查找难以找到的信息的能力。 https://t.co/7JGO7PWqau
OpenAI Open Sources BrowseComp: A New Benchmark for Measuring the Ability for AI Agents to Browse the Web https://t.co/8u0ant0Ktj... Like and Follow for more QuantumBytz updates! Subscribe to our Telegram channel @quantumbytz.
Hack where's there's LESS COMPETITION 😎 Oh, and keep it fresh. BONUS points up for grabs to the first person to reply with the number of times I say "Fresh" in this post! https://t.co/mo61HFgQAH
OpenAI has announced the open-sourcing of BrowseComp, a new benchmark designed to evaluate the performance of AI agents in browsing the internet for difficult-to-find information. This benchmark, referred to as the 'Browsing Competition', consists of 1,266 short-answer questions aimed at testing the capabilities of AI agents. Initial results show that models such as GPT-4.5 and GPT-4o, which include browsing capabilities, achieved less than 2 percent accuracy on the benchmark. In contrast, a specialized model known as Deep Research, which was specifically trained for this task, attained an accuracy rate of 51.5 percent. The release of BrowseComp is intended to provide a challenging environment for AI agents, akin to competitive coding or math contests, thereby enhancing the evaluation of their browsing intelligence.