EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies
Xavier Hu, Jinxiang Xia, Shengze Xu +13 more
Long-horizon planning is widely recognized as a core capability of autonomous LLM-based agents; however, current evaluation frameworks suffer from being largely episodic, domain-specific, or insufficiently grounded in persistent economic dynamics. We introduce EcoGym, a generalizable benchmark for c...