AI Agents

Dreaming in Code for Curriculum Learning in Open-Ended Worlds

KKonstantinos MitsidesMMaxence FaldorAAntoine Cully
Published
February 9, 2026
Authors
3
Word Count
9,841
Code
Includes code

AI agents learn faster with foundation models generating custom training environments at optimal difficulty.

Abstract

Open-ended learning frames intelligence as emerging from continual interaction with an ever-expanding space of environments. While recent advances have utilized foundation models to programmatically generate diverse environments, these approaches often focus on discovering isolated behaviors rather than orchestrating sustained progression. In complex open-ended worlds, the large combinatorial space of possible challenges makes it difficult for agents to discover sequences of experiences that remain consistently learnable. To address this, we propose Dreaming in Code (DiCode), a framework in which foundation models synthesize executable environment code to scaffold learning toward increasing competence. In DiCode, "dreaming" takes the form of materializing code-level variations of the world. We instantiate DiCode in Craftax, a challenging open-ended benchmark characterized by rich mechanics and long-horizon progression. Empirically, DiCode enables agents to acquire long-horizon skills, achieving a 16% improvement in mean return over the strongest baseline and non-zero success on late-game combat tasks where prior methods fail. Our results suggest that code-level environment design provides a practical mechanism for curriculum control, enabling the construction of intermediate environments that bridge competence gaps in open-ended worlds. Project page and source code are available at https://konstantinosmitsides.github.io/dreaming-in-code and https://github.com/konstantinosmitsides/dreaming-in-code.

Key Takeaways

  • 1

    Foundation models can generate executable code to create custom training environments tailored to agent skill levels.

  • 2

    Learnability metrics identify the optimal difficulty sweet spot where agents succeed approximately fifty percent of the time.

  • 3

    Code-based environment generation unlocks vastly larger design spaces than traditional parameter-tuning approaches in open-ended learning.

Limitations

  • Parameter tuning methods like PLR cannot modify game structure or mechanics, only adjust numerical difficulty knobs.

  • Open-ended environments present massive combinatorial spaces where most scenarios are either trivially easy or impossibly difficult.

Keywords

foundation modelsopen-ended learningenvironment synthesiscurriculum controllong-horizon progressionskill acquisition

More in AI Agents

View all
Dreaming in Code for Curriculum Learning in Open-Ended Worlds | Paperchime