World Models

World models are AI systems that learn compressed internal representations of an environment — its physics, appearance, dynamics, and rules — enabling prediction of future states and generation of novel scenarios. They represent a fundamental shift from hand-authored virtual worlds to worlds that emerge from learned understanding.

The concept originates in cognitive science: humans navigate the world using internal models that predict what happens next. If you throw a ball, your brain simulates its trajectory before it lands. World models in AI attempt the same: learn the "rules" of an environment from observation, then use that understanding to predict, plan, or generate.

Early world models were relatively simple — David Ha and Jürgen Schmidhuber's 2018 "World Models" paper trained a variational autoencoder and recurrent network to learn a latent representation of simple game environments. The agent could then "dream" — planning actions in the learned model rather than the real environment.

The field has since scaled dramatically. Google DeepMind's Genie (2024) and Genie 2 learned playable 2D and 3D environments from video, generating interactive worlds from single images. NVIDIA's Cosmos world foundation models are trained on massive video datasets to understand physical dynamics. These systems don't just predict the next frame — they model object permanence, gravity, collisions, and spatial relationships.

For game development, world models offer a provocative possibility: generating entire game levels or environments from learned priors rather than manual design. A world model trained on thousands of hours of gameplay footage could generate novel but plausible game scenarios, terrain layouts, or physics behaviors. This connects to procedural generation but operates at a fundamentally different level — generation from understanding rather than from rules.

World models are also critical for robotics and autonomous systems. A robot with an accurate world model can simulate the consequences of actions before executing them, dramatically reducing the need for real-world trial and error. Tesla's approach to self-driving, NVIDIA's Isaac platform for robotics, and various embodied AI projects all rely on learned world models for planning and simulation.

The convergence of world models with generative video, 3D scene representations, and language models points toward a future where AI systems don't just generate static content but interactive, physically coherent environments that respond to actions in real time.

World Models

Related Topics

Further Reading