AI Agents

MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering

CChuanzhe GuoJJingjing WuSSijun HeYYang ChenZZhaoqi KuangSShilong FanBBingjin ChenSSiqi BaoJJing LiuHHua WuQQingfu ZhuWWanxiang CheHHaifeng Wang
Published
January 30, 2026
Authors
13
Word Count
12,275
Code
Includes code

Revolutionizes LLM agent environment setup across languages.

Abstract

The evolution of Large Language Model (LLM) agents for software engineering (SWE) is constrained by the scarcity of verifiable datasets, a bottleneck stemming from the complexity of constructing executable environments across diverse languages. To address this, we introduce MEnvAgent, a Multi-language framework for automated Environment construction that facilitates scalable generation of verifiable task instances. MEnvAgent employs a multi-agent Planning-Execution-Verification architecture to autonomously resolve construction failures and integrates a novel Environment Reuse Mechanism that reduces computational overhead by incrementally patching historical environments. Evaluations on MEnvBench, a new benchmark comprising 1,000 tasks across 10 languages, demonstrate that MEnvAgent outperforms baselines, improving Fail-to-Pass (F2P) rates by 8.6% while reducing time costs by 43%. Additionally, we demonstrate the utility of MEnvAgent by constructing MEnvData-SWE, the largest open-source polyglot dataset of realistic verifiable Docker environments to date, alongside solution trajectories that enable consistent performance gains on SWE tasks across a wide range of models. Our code, benchmark, and dataset are available at https://github.com/ernie-research/MEnvAgent.

Key Takeaways

  • 1

    Automates scalable, polyglot environment setup for LLM agents.

  • 2

    Utilizes multi-agent architecture for efficient environment construction.

  • 3

    Implements Environment Reuse Mechanism to reduce computational overhead.

Limitations

  • Dependent on the availability of historical environments for reuse.

  • May face challenges with highly unique or complex dependencies.

Keywords

Large Language Model agentssoftware engineeringverifiable datasetsmulti-agent Planning-Execution-Verification architectureEnvironment Reuse MechanismDocker environmentsMEnvBenchMEnvData-SWE

More in AI Agents

View all
MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering | Paperchime