No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning
Zhicong Li, Lingjie Jiang, Yulan Hu +7 more
Critique-guided reinforcement learning (RL) has emerged as a powerful paradigm for training LLM agents by augmenting sparse outcome rewards with natural-language feedback. However, current methods often rely on static or offline critic models, which fail to adapt as the policy evolves. In on-policy ...