Latest AI Agents Research Papers

Research on autonomous AI agents, tool use, planning, and systems that can take actions to accomplish goals.

156 Papers

Showing 16 of 16 papers

MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

Changle Qu, Sunhao Dai, Hengyi Cai +3 more

Tool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement learning methods typically rely on outcome- or trajectory-level rewards, assigning uniform advantages to all ...

Jan 15, 202623

Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

Xinyu Zhu, Yuzhu Cai, Zexi Liu +11 more

The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (LLMs) have de...

Large Language Modelsultra-long-horizon autonomymachine learning engineeringHierarchical Cognitive Cachingcognitive accumulation+4 more

Jan 15, 202635

Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text

Zhihao Xu, Rumei Li, Jiahuan Li +4 more

Enabling Large Language Models (LLMs) to effectively utilize tools in multi-turn interactions is essential for building capable autonomous agents. However, acquiring diverse and realistic multi-turn tool-use data remains a significant challenge. In this work, we propose a novel text-based paradigm. ...

large language modelsmulti-turn interactionstool-use datatext corporadata synthesis pipeline+9 more

Jan 15, 202636

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

Caihua Li, Lianghong Guo, Yanlin Wang +9 more

Issue resolution, a complex Software Engineering (SWE) task integral to real-world development, has emerged as a compelling challenge for artificial intelligence. The establishment of benchmarks like SWE-bench revealed this task as profoundly difficult for large language models, thereby significantl...

large language modelssoftware engineeringautonomous coding agentstraining-free frameworkssupervised fine-tuning+3 more

Jan 15, 202655

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

Yibo Lyu, Gongwei Chen, Rui Shao +2 more

While GUI agents have shown strong performance under explicit and completion instructions, real-world deployment requires aligning with users' more complex implicit intents. In this work, we highlight Hierarchical Implicit Intent Alignment for Personalized GUI Agent (PersonalAlign), a new agent task...

Hierarchical Implicit Intent AlignmentPersonalized GUI Agentlong-term user recordsvague instructionslatent routines+5 more

Jan 14, 20266

EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

Shuo Zhang, Chaofa Yuan, Ryan Guo +11 more

While LLM-based agents have shown promise for deep research, most existing approaches rely on fixed workflows that struggle to adapt to real-world, open-ended queries. Recent work therefore explores self-evolution by allowing agents to rewrite their own code or prompts to improve problem-solving abi...

Jan 14, 20261

MAXS: Meta-Adaptive Exploration with LLM Agents

Jian Zhang, Zhiyuan Wang, Zhangqi Wang +7 more

Large Language Model (LLM) Agents exhibit inherent reasoning abilities through the collaboration of multiple tools. However, during agent inference, existing methods often suffer from (i) locally myopic generation, due to the absence of lookahead, and (ii) trajectory instability, where minor early e...

Jan 14, 202616

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Yibo Wang, Lei Wang, Yue Deng +7 more

Deep research systems are widely used for multi-step web research, analysis, and cross-source synthesis, yet their evaluation remains challenging. Existing benchmarks often require annotation-intensive task construction, rely on static evaluation dimensions, or fail to reliably verify facts when cit...

Jan 14, 202636

User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale

Jungho Cho, Minbyul Jeong, Sungrae Park

The recent paradigm shift toward large reasoning models (LRMs) as autonomous agents has intensified the demand for sophisticated, multi-turn tool-use capabilities. Yet, existing datasets and data-generation approaches are limited by static, predefined toolsets that cannot scale to the complexity of ...

Jan 13, 202641

ExpSeek: Self-Triggered Experience Seeking for Web Agents

Wenyuan Zhang, Xinghua Zhang, Haiyang Yu +5 more

Experience intervention in web agents emerges as a promising technical paradigm, enhancing agent interaction capabilities by providing valuable insights from accumulated experiences. However, existing methods predominantly inject experience passively as global context before task execution, struggli...

Jan 13, 20265

The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios

Daocheng Fu, Jianbiao Mei, Rong Wu +7 more

The rapid evolution of Multi-modal Large Language Models (MLLMs) has advanced workflow automation; however, existing research mainly targets performance upper bounds in static environments, overlooking robustness for stochastic real-world deployment. We identify three key challenges: dynamic task sc...

Jan 13, 20262

Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

Youwei Liu, Jian Wang, Hanlin Wang +2 more

Recent advances in world models have shown promise for modeling future dynamics of environmental states, enabling agents to reason and act without accessing real environments. Current methods mainly perform single-step or fixed-horizon rollouts, leaving their potential for complex task planning unde...

Jan 13, 202610

The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents

Weihao Xuan, Qingcheng Zeng, Heli Qi +3 more

Autonomous agents based on large language models (LLMs) are rapidly evolving to handle multi-turn tasks, but ensuring their trustworthiness remains a critical challenge. A fundamental pillar of this trustworthiness is calibration, which refers to an agent's ability to express confidence that reliabl...

Jan 12, 202621

MemoBrain: Executive Memory as an Agentic Brain for Reasoning

Hongjin Qian, Zhao Cao, Zheng Liu

Complex reasoning in tool-augmented agent frameworks is inherently long-horizon, causing reasoning traces and transient tool artifacts to accumulate and strain the bounded working context of large language models. Without explicit memory mechanisms, such accumulation disrupts logical continuity and ...

Jan 12, 202633

MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

Qihao Wang, Ziming Cheng, Shuo Zhang +12 more

While autonomous software engineering (SWE) agents are reshaping programming paradigms, they currently suffer from a "closed-world" limitation: they attempt to fix bugs from scratch or solely using local context, ignoring the immense historical human experience available on platforms like GitHub. Ac...

Jan 11, 202665

ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

Siyuan Hu, Kevin Qinghong Lin, Mike Zheng Shou

Building intelligent agents capable of dexterous manipulation is essential for achieving human-like automation in both robotics and digital environments. However, existing GUI agents rely on discrete click predictions (x,y), which prohibits free-form, closed-loop trajectories (e.g. dragging a progre...

Dec 31, 202537

PreviousPage 8 of 8

View all categories

Latest AI Agents Research | AI Agents Papers