Reinforcement Learning

Multi-agent cooperation through in-context co-player inference

MMarissa A. WeisMMaciej WołczykRRajai NasserRRif A. SaurousBBlaise Agüera y ArcasJJoão SacramentoAAlexander Meulemans
Published
February 18, 2026
Authors
7
Word Count
9,172
Code
Includes code

Sequence models enable AI agents to cooperate by inferring co-player strategies in-context.

Abstract

Achieving cooperation among self-interested agents remains a fundamental challenge in multi-agent reinforcement learning. Recent work showed that mutual cooperation can be induced between "learning-aware" agents that account for and shape the learning dynamics of their co-players. However, existing approaches typically rely on hardcoded, often inconsistent, assumptions about co-player learning rules or enforce a strict separation between "naive learners" updating on fast timescales and "meta-learners" observing these updates. Here, we demonstrate that the in-context learning capabilities of sequence models allow for co-player learning awareness without requiring hardcoded assumptions or explicit timescale separation. We show that training sequence model agents against a diverse distribution of co-players naturally induces in-context best-response strategies, effectively functioning as learning algorithms on the fast intra-episode timescale. We find that the cooperative mechanism identified in prior work-where vulnerability to extortion drives mutual shaping-emerges naturally in this setting: in-context adaptation renders agents vulnerable to extortion, and the resulting mutual pressure to shape the opponent's in-context learning dynamics resolves into the learning of cooperative behavior. Our results suggest that standard decentralized reinforcement learning on sequence models combined with co-player diversity provides a scalable path to learning cooperative behaviors.

Key Takeaways

  • 1

    AI agents can learn cooperation through in-context learning without explicit opponent modeling or weight updates.

  • 2

    Heterogeneous agent populations create gradient pressures that drive mutual defection avoidance and genuine cooperation.

  • 3

    Sequence models naturally develop adaptive capabilities that replicate complex multi-agent cooperation dynamics efficiently.

Limitations

  • Prior approaches require rigid assumptions about opponent learning or artificial separation of naive and meta-learners.

  • Standard reinforcement learning struggles with non-stationary environments where all agents simultaneously learn and adapt.

Keywords

multi-agent reinforcement learningsequence modelsin-context learningcooperative behaviorlearning-aware agentslearning dynamicsfast timescalemeta-learnersdecentralized reinforcement learningco-player diversity

More in Reinforcement Learning

View all
Multi-agent cooperation through in-context co-player inference | Paperchime