The AI model o3 has demonstrated advanced capabilities in image analysis and reasoning through various tasks. It successfully identified individual plants in a flower garden photo, correctly naming 10 out of 15 species. In another instance, o3 solved a crossword puzzle by analyzing the image multiple times and using external web searches, completing the task in 11 minutes and 6 seconds with only a minor positional error. Additionally, o3 decoded a puzzle involving Nvidia's H100/A100 flop ratio by cropping and researching relevant data online. The model also identified and located a Chinese restaurant menu in San Francisco without any metadata, relying solely on web searches to match menu items. These examples highlight o3’s ability to use tool-assisted image zooming, cropping, and web exploration to perform complex reasoning tasks that resemble human-like attention to detail.
Looks like o3 is the first model to solve this crossword puzzle! Time: 11 minutes and 6 seconds Tactic: Analyzed the image several times and visited many websites to look for clues I don't think I've ever seen this level of reasoning and tool usage from a model. https://t.co/LUREskwtVG
o3 really blew my mind with this one. I gave it an image of a menu of my favorite Chinese place in SF with no title or EXIF data, and it was able to search the web, match menu items, and locate it. 🤯 https://t.co/siphWJJD5Q
o3 solves a cross word puzzle, finds all the words perfectly! and marks the words on the image it only is one square off to the right on the image it thinks for almost 7 minutes https://t.co/xcynJtplt8 https://t.co/YW4Rms3Oib