Google DeepMind on Tuesday introduced Genie 3, its most advanced “world model,” capable of generating real-time, fully navigable 3D environments from a single text prompt. The system renders at 720p and 24 frames per second and can preserve visual and physical consistency for several minutes—an order-of-magnitude jump from the 10- to 20-second, 360p output of last year’s Genie 2. Genie 3 keeps track of objects for roughly one minute via a short-term “world memory,” allowing users to leave and revisit a scene without losing context. A new feature for “promptable world events” lets creators alter weather, add characters or otherwise reshape the scene on the fly, while end-to-end control latency is about 50 milliseconds, according to the research team. DeepMind executives said the technology is intended not only for next-generation game and media production but also for training embodied AI agents and robots in simulated environments—a capability they describe as a critical step toward artificial general intelligence. Early tests show the model supporting DeepMind’s SIMA agent in completing goal-oriented tasks inside the generated worlds. The software remains in a restricted research preview, with access limited to select academics and creators while the company works on longer simulation horizons, multi-agent interactions and geographic accuracy. DeepMind has not yet disclosed hardware requirements or a timeline for broader release.
Genie 3 by Google DeepMind is insane. But it doesn't just generate interactive AI spatial worlds from text. it also steers images & videos and chains actions to hit complex goals. 10 wild examples: 1. Step into the "Nighthawks" by Edward Hopper https://t.co/2NfkaN4ERB
De “El cortador de césped” a Genie 3: El futuro que imaginamos con mundos virtuales en tiempo real: https://t.co/cogoNEoExj by Un informático en el lado del mal #infosec #cybersecurity #technology #news
Video gen AI virtual salespeople are going to be insane https://t.co/0PeeyjAFX7