MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching
Changle Qu, Sunhao Dai, Hengyi Cai +3 more
Tool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement learning methods typically rely on outcome- or trajectory-level rewards, assigning uniform advantages to all ...