TRIP-Bench: A Benchmark for Long-Horizon Interactive Agents in Real-World Scenarios
Yuanzhe Shen, Zisu Huang, Zhengyuan Wang +14 more
As LLM-based agents are deployed in increasingly complex real-world settings, existing benchmarks underrepresent key challenges such as enforcing global constraints, coordinating multi-tool reasoning, and adapting to evolving user behavior over long, multi-turn interactions. To bridge this gap, we i...