Building AI That Simulates Physical Reality

April 2, 2026

•

8 min read

Share this article

Building AI That Simulates Physical Reality

How DeepMind, World Labs, and AMI Labs bet on world models — AI that predicts physical outcomes rather than the next token in a sentence.

Summary

Language models predict words. World models predict what happens next in reality. That architectural difference is why some of the most prominent names in AI research have converged on world models as the next essential capability layer. In 2026, three organizations are leading the build: Google DeepMind released a real-time interactive world model based on video diffusion; Fei-Fei Li's World Labs commercialized Marble, a navigable 3D environment simulator; and Yann LeCun's AMI Labs closed a $1.03 billion seed round — the largest ever in Europe — to build JEPA-based AI that learns physical understanding from first principles.

What a World Model Actually Does

A language model is trained to predict the next token in a sequence. Given a passage of text, it estimates which word, phrase, or character comes next — and through this, learns to reason about language, concepts, and patterns.

A world model does something structurally different. Given the current state of a physical environment, it predicts what happens next: how objects behave under force, what a space looks like from a different vantage point, what the consequences of a particular action are. The training data is not text — it is video, sensor readings, and simulations.

The goal is AI that can simulate before it acts. Rather than describing a plan in words and hoping the description is accurate, an AI with a world model can internally simulate the outcome of a plan, identify what will fail, and revise before anything is executed in the real world.

Three organizations are building this capability in meaningfully different ways.

Google DeepMind: Video into Interactive Simulation

In August 2025, DeepMind released a real-time interactive world model that converts video into playable simulation. The input is any video — footage of a room, an outdoor environment, a game world. The output is an interactive version of that video: the user can move through it, take actions, and the model generates the physically plausible next frame in real time.

DeepMind's approach avoids hand-crafted physics rules entirely. The model learns physical dynamics from training on massive video datasets — effectively inferring how the world works from watching it. The simulations respect gravity, occlusion, object permanence, and the basic structure of physical space without any explicit rules encoding those properties.

Current applications:

Robotics training environments: generate unlimited novel scenarios from reference footage instead of collecting new physical data
Game development: generate interactive prototype environments from reference material
AI agent grounding: test plans in simulation before real-world execution

Longer-term implication: DeepMind's world model is a foundation for autonomous agents that can reason about consequences before acting — rather than acting and observing results.

World Labs: Marble, the Navigable 3D World

Fei-Fei Li — co-director of Stanford's Human-Centered AI Institute and former Google Cloud AI chief — founded World Labs to commercialize large-scale world modeling. The company's 2026 product launch is Marble: a generative model that creates real-time, navigable 3D environments from scratch.

Where DeepMind converts existing video into interactive simulation, Marble generates entirely new 3D worlds from a description or rough sketch. These environments can be explored from any angle with consistent spatial geometry and physics — a scene generated by Marble maintains structural coherence as you move through it, which prior generative systems could not achieve reliably.

Where Marble is being used:

Application	What it enables
Robotics training	Unlimited varied training environments without physical data collection
Game and XR prototyping	Prototype world layouts and environments without 3D artists
Architecture and design	Walkable building simulations from floor plans or descriptions
Scientific research	Physical environments for experiments too dangerous or expensive to run in reality

The commercial opportunity World Labs is targeting is the cost of 3D content: currently measured in millions of dollars and months of production time. Marble-style generation compresses this to hours.

AMI Labs: The $1.03 Billion Bet on JEPA

Advanced Machine Intelligence Labs, co-founded by Yann LeCun (Meta Chief AI Scientist), closed a $1.03 billion seed round — the largest European seed round ever — from a consortium of European technology investors.

AMI Labs is not building a larger language model or a better video diffusion system. It is building AI based on LeCun's Joint Embedding Predictive Architecture (JEPA), which works on fundamentally different principles than current world models.

Instead of predicting raw pixels or text tokens, JEPA trains AI to predict abstract representations — the meaningful structure of a scene rather than its literal appearance. LeCun's argument: human common sense is not built from memorizing observations of the world. It is built from learning abstract models of cause and effect, physical dynamics, and object behavior at a conceptual level. JEPA attempts to replicate that learning process.

The practical difference: JEPA-based systems should generalize to novel physical situations more effectively than video diffusion models, because they are not trying to reconstruct every pixel — they are modeling the conceptual structure that generates those pixels.

AMI Labs's thesis, in LeCun's framing: scaling language models cannot produce general AI. The missing ingredient is a world model that understands physics from first principles, not from statistical patterns in text or video.

Comparing the Three Approaches

	Google DeepMind	World Labs (Marble)	AMI Labs (JEPA)
Core approach	Video diffusion — learn physics from watching it	Generative 3D environment synthesis	Abstract representation prediction
Training data	Video footage	Multi-modal environmental data	Not released; conceptual training
Output	Interactive simulation from reference video	Novel 3D worlds from description	Abstract world model for reasoning
Stage	Released (Aug 2025)	Commercialized (2026)	Research / early build
Intended use	Robotics training, agent grounding	Robotics, game dev, architecture, XR	Long-term general AI foundation

Why This Matters Outside the Lab

World models are infrastructure for the next generation of AI products, not near-term consumer applications. But the products being built in 2026 and 2027 will increasingly depend on world modeling capabilities that are being established now:

Robotics at scale: Every company building physical AI — warehouse automation, manufacturing, delivery — needs to train robots on diverse scenarios. World models generate unlimited varied training environments without the cost of physical data collection. DeepMind and World Labs are building the training environment generation layer that the entire robotics industry will use.

AI agents with physical grounding: Current AI agents, including the most capable language-model-based systems, hallucinate about physical constraints because they reason about the physical world from text descriptions alone. An AI with a world model can simulate whether a plan physically works before committing to it.

3D content creation: Marble-class systems will compress 3D content production timelines and costs by orders of magnitude — with direct implications for game development, film production, architecture, and immersive media.

Frequently Asked Questions

What is a world model in AI? A world model is an AI system that builds an internal simulation of physical reality — encoding how objects behave, how cause and effect work, and what happens next after a given action. Unlike language models that predict the next text token, world models predict the next state of a physical environment. They are considered foundational for robotics, autonomous vehicles, and physical AI agents.

What is Yann LeCun building at AMI Labs? AMI Labs, co-founded by LeCun and funded with a $1.03 billion seed round (the largest in European startup history), is developing AI based on JEPA — Joint Embedding Predictive Architecture. JEPA predicts abstract representations rather than raw pixels or tokens, aiming to give AI the kind of physical common sense humans develop through experience rather than observation. LeCun argues JEPA is the necessary architecture for AI that can genuinely reason about the physical world.

What is World Labs' Marble product? Marble is a large world model from World Labs (founded by Fei-Fei Li) that generates real-time navigable 3D simulations from descriptions or sketches. Unlike systems that convert existing video into simulation, Marble creates novel 3D environments with consistent physics and spatial geometry. Applications include robotics training environments, game and AR/VR prototyping, and architectural visualization.

How does Google DeepMind's world model work? DeepMind's world model, released August 2025, takes video input and converts it into an interactive simulation. The user can navigate and act within the simulated environment, and the model generates physically plausible next frames in real time. Rather than hand-coding physics rules, the model learns physical dynamics from training on large video datasets — inferring how the world behaves from observing it.

Sources

Google DeepMind — Real-time interactive world model release, August 2025
World Labs — Marble product launch and technical documentation, 2026
AMI Labs — $1.03B seed round announcement and JEPA architecture overview, 2026
Yann LeCun — JEPA research papers and public presentations, Meta AI, 2025–2026

Published on April 2, 2026