Latest AI Agents Research Papers

Research on autonomous AI agents, tool use, planning, and systems that can take actions to accomplish goals.

154 Papers
Showing 20 of 20 papers

DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval

Maojun Sun, Yue Wu, Yifei Xie +5 more

Large Language Model (LLM) agents can automate data-science workflows, but many rigorous statistical methods implemented in R remain underused because LLMs struggle with statistical knowledge and tool retrieval. Existing retrieval-augmented approaches focus on function-level semantics and ignore dat...

retrieval-augmented approachesfunction-level semanticsdata distributionR Package Knowledge Baseembedding model+5 more
Mar 5, 202645

Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces

Karan Gupta, Pranav Vajreshwari, Yash Pandya +3 more

Agentic systems operating over large tool ecosystems must plan and execute long-horizon workflows under weak or non-verifiable supervision. While frontier models mitigate these challenges through scale and large context budgets, small language models (SLMs) remain brittle: eager tool loading saturat...

reinforcement fine-tuningcontext controlexecution structuretool orchestrationprogrammatic tool orchestration+5 more
Mar 5, 202612

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Zhenting Wang, Huancheng Chen, Jiayun Wang +1 more

Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the cont...

large language model agentscontext windowsexperience memoryindexed memoryreinforcement learning+6 more
Mar 4, 202614

BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?

Guoxin Chen, Fanzhe Meng, Jiale Zhao +12 more

Current benchmarks for code agents primarily assess narrow, repository-specific fixes, overlooking critical real-world challenges such as cross-repository reasoning, domain-specialized problem solving, dependency-driven migration, and full-repository generation. To address this gap, we introduce Bey...

code agentsbenchmarkscross-repository reasoningdomain-specialized problem solvingdependency-driven migration+8 more
Mar 3, 202650

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

Dadi Guo, Yuejin Xie, Qingyu Liu +7 more

As large language models (LLMs) advance their mathematical capabilities toward the IMO level, the scarcity of challenging, high-quality problems for training and evaluation has become a significant bottleneck. Simultaneously, recent code agents have demonstrated sophisticated skills in agentic codin...

large language modelsmathematical capabilitiescode agentsagentic codingreasoning+3 more
Mar 3, 202613

CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

Jinpeng Chen, Cheng Gong, Hanbo Li +9 more

Developing multi-turn interactive tool-use agents is challenging because real-world user needs are often complex and ambiguous, yet agents must execute deterministic actions to satisfy them. To address this gap, we introduce CoVe (Constraint-Verification), a post-training data synthesis framework de...

post-training data synthesissupervised fine-tuningreinforcement learningtask constraintstrajectory generation+4 more
Mar 2, 202620

SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

Ibragim Badertdinov, Maksim Nekrashevich, Anton Shevtsov +1 more

Software engineering agents (SWE) are improving rapidly, with recent gains largely driven by reinforcement learning (RL). However, RL training is constrained by the scarcity of large-scale task collections with reproducible execution environments and reliable test suites. Although a growing number o...

reinforcement learningsoftware engineering agentsSWE-benchLLM judgesreproducible execution+5 more
Feb 27, 202647

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

Zhiheng Song, Jingshuai Zhang, Chuan Qin +6 more

Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is hindered by diverse rout...

route-planning agentslarge language modelsMobilityBenchAPI-replay sandboxdeterministic environment+7 more
Feb 26, 2026103

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Qianben Chen, Tianrui Qin, King Zhu +21 more

Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose Search More, Think Less...

deep research agentsreasoning depthinference costsearch-intensive scenariosgeneralization+9 more
Feb 26, 202620

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

Zeyuan Liu, Jeonghye Kim, Xufang Luo +2 more

Exploration remains the key bottleneck for large language model agents trained with reinforcement learning. While prior methods exploit pretrained knowledge, they fail in environments requiring the discovery of novel states. We propose Exploratory Memory-Augmented On- and Off-Policy Optimization (EM...

reinforcement learninglarge language model agentsexplorationmemory augmentationon-policy updates+3 more
Feb 26, 202633

AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

Yutong Wang, Siyuan Xiong, Xuebo Liu +4 more

While Multi-Agent Systems (MAS) excel in complex reasoning, they suffer from the cascading impact of erroneous information generated by individual participants. Current solutions often resort to rigid structural engineering or expensive fine-tuning, limiting their deployability and adaptability. We ...

multi-agent systemstest-time rectify-or-reject pruningretrieval-augmented rectifierfailure-driven indicator pooldistilled failure patterns+3 more
Feb 26, 202627

SkillNet: Create, Evaluate, and Connect AI Skills

Yuan Liang, Ruobin Zhong, Haoming Xu +46 more

Current AI agents can flexibly invoke tools and execute complex tasks, yet their long-term advancement is hindered by the lack of systematic accumulation and transfer of skills. Without a unified mechanism for skill consolidation, agents frequently ``reinvent the wheel'', rediscovering solutions in ...

AI agentsskill consolidationtool invocationlong-term advancementunified mechanism+9 more
Feb 26, 202668

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Rui Yang, Qianhui Wu, Zhaoyang Wang +8 more

Open-source native GUI agents still lag behind closed-source systems on long-horizon navigation tasks. This gap stems from two limitations: a shortage of high-quality, action-aligned reasoning data, and the direct adoption of generic post-training pipelines that overlook the unique challenges of GUI...

GUI agentsaction-aligned reasoning datapost-training pipelinesSFTCoT reasoning+9 more
Feb 25, 202615

SkillOrchestra: Learning to Route Agents via Skill Transfer

Jiayu Wang, Yifei Ming, Zixuan Ke +3 more

Compound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trai...

compound AI systemsorchestrationrouting policyreinforcement learningskill modeling+5 more
Feb 23, 202646

Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents

Wenxuan Ding, Nicholas Tomlin, Greg Durrett

LLMs are increasingly being used for complex problems which are not necessarily resolved in a single response, but require interacting with an environment to acquire information. In these scenarios, LLMs must reason about inherent cost-uncertainty tradeoffs in when to stop exploring and commit to an...

large language modelssequential decision-makingcost-uncertainty tradeoffsenvironment explorationCalibrate-Then-Act framework+5 more
Feb 18, 202614

Discovering Multiagent Learning Algorithms with Large Language Models

Zun Li, John Schultz, Daniel Hennes +1 more

Much of the advancement of Multi-Agent Reinforcement Learning (MARL) in imperfect-information games has historically depended on manual iterative refinement of baselines. While foundational families like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO) rest on solid ...

Multi-Agent Reinforcement Learningimperfect-information gamesCounterfactual Regret MinimizationPolicy Space Response Oraclesevolutionary coding+10 more
Feb 18, 202611
Page 1 of 8Next
Latest AI Agents Research | AI Agents Papers