Apr 21, 12:33 PM

OpenAI's o3 and o4-mini, Google's Gemini 2.5 Flash and Projects Astra, Genie 2 Advance AI

OpenAI has released new AI models, o3 and o4-mini, which demonstrate advanced capabilities in visual perception and reasoning. These models can analyze images in their chain of thought, allowing them to solve complex problems by manipulating images, such as cropping, zooming, and rotating. The o3 model, in particular, has shown remarkable accuracy in identifying locations from photos, achieving 95.7% accuracy on the V* benchmark. These models are available to ChatGPT Plus users, with a one million token context window. Google has introduced Gemini 2.5 Flash, a new AI model designed to enhance reasoning capabilities while maintaining speed and cost-efficiency. This model, with a knowledge cutoff of January 2025, allows developers to control the level of reasoning applied to prompts, making it versatile for various applications. Gemini 2.5 Flash has shown strong performance in benchmark tests, scoring 12% on Humanity's Last Exam (HLE) and 47.1% on the Aider Polyglot coding benchmark, surpassing several competitor models including Claude 3.7 Sonnet and DeepSeek R1. The model is available in preview through the Gemini API, AI Studio, and Vertex AI, with a cost of 60 cents per one million tokens when reasoning is enabled. Google DeepMind, under the leadership of Demis Hassabis, is advancing its AI research with projects like Project Astra and Genie 2. Project Astra aims to create a multimodal AI agent capable of understanding and interacting with the physical world, while Genie 2 focuses on generating interactive 3D environments from static images. These developments signal Google's push towards more advanced AI systems. Hassabis has also expressed optimism about AI's potential to cure all diseases within the next decade.