AI Agents

Discovering Multiagent Learning Algorithms with Large Language Models

ZZun LiJJohn SchultzDDaniel HennesMMarc Lanctot
Published
February 18, 2026
Authors
4
Word Count
6,105
Code
Includes code

LLMs discover novel multiagent algorithms through automated code evolution, outperforming manual human designs.

Abstract

Much of the advancement of Multi-Agent Reinforcement Learning (MARL) in imperfect-information games has historically depended on manual iterative refinement of baselines. While foundational families like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO) rest on solid theoretical ground, the design of their most effective variants often relies on human intuition to navigate a vast algorithmic design space. In this work, we propose the use of AlphaEvolve, an evolutionary coding agent powered by large language models, to automatically discover new multiagent learning algorithms. We demonstrate the generality of this framework by evolving novel variants for two distinct paradigms of game-theoretic learning. First, in the domain of iterative regret minimization, we evolve the logic governing regret accumulation and policy derivation, discovering a new algorithm, Volatility-Adaptive Discounted (VAD-)CFR. VAD-CFR employs novel, non-intuitive mechanisms-including volatility-sensitive discounting, consistency-enforced optimism, and a hard warm-start policy accumulation schedule-to outperform state-of-the-art baselines like Discounted Predictive CFR+. Second, in the regime of population based training algorithms, we evolve training-time and evaluation-time meta strategy solvers for PSRO, discovering a new variant, Smoothed Hybrid Optimistic Regret (SHOR-)PSRO. SHOR-PSRO introduces a hybrid meta-solver that linearly blends Optimistic Regret Matching with a smoothed, temperature-controlled distribution over best pure strategies. By dynamically annealing this blending factor and diversity bonuses during training, the algorithm automates the transition from population diversity to rigorous equilibrium finding, yielding superior empirical convergence compared to standard static meta-solvers.

Key Takeaways

  • 1

    Large language models can automatically discover novel multiagent learning algorithms by semantically mutating source code through evolutionary search.

  • 2

    Two new algorithms, VAD-CFR and SHOR-PSRO, outperformed state-of-the-art baselines despite employing unintuitive mechanisms humans wouldn't design.

  • 3

    Treating algorithm design as a searchable problem automates the manual trial-and-error process that has historically driven multiagent reinforcement learning advances.

Limitations

  • The evolutionary approach requires evaluation on proxy games, which may not fully represent real-world algorithmic performance.

  • The script is incomplete and doesn't describe full validation results or computational costs of the discovery process.

Keywords

Multi-Agent Reinforcement Learningimperfect-information gamesCounterfactual Regret MinimizationPolicy Space Response Oraclesevolutionary codinglarge language modelsregret accumulationpolicy derivationVolatility-Adaptive Discounted CFRdiscounted predictive CFRpopulation based trainingmeta strategy solversOptimistic Regret Matchingsmoothed distributionequilibrium finding

More in AI Agents

View all
Discovering Multiagent Learning Algorithms with Large Language Models | Paperchime