Latest AI Agents Research Papers

Research on autonomous AI agents, tool use, planning, and systems that can take actions to accomplish goals.

156 Papers
Showing 20 of 20 papers

DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents

Nikita Gupta, Riju Chatterjee, Lukas Haas +9 more

We introduce DeepSearchQA, a 900-prompt benchmark for evaluating agents on difficult multi-step information-seeking tasks across 17 different fields. Unlike traditional benchmarks that target single answer retrieval or broad-spectrum factuality, DeepSearchQA features a dataset of challenging, handcr...

multi-step information-seeking taskscausal chainsystematic collationde-duplicationentity resolution+9 more
Jan 28, 20266

BMAM: Brain-inspired Multi-Agent Memory Framework

Yang Li, Jiaxiang Liu, Yusong Wang +2 more

Language-model-based agents operating over extended interaction horizons face persistent challenges in preserving temporally grounded information and maintaining behavioral consistency across sessions, a failure mode we term soul erosion. We present BMAM (Brain-inspired Multi-Agent Memory), a genera...

language-model-based agentsextended interaction horizonstemporally grounded informationbehavioral consistencysoul erosion+9 more
Jan 28, 20263

AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts

Shicheng Fang, Yuxin Wang, XiaoRan Liu +5 more

The evolution of Large Language Models (LLMs) into autonomous agents necessitates the management of extensive, dynamic contexts. Current benchmarks, however, remain largely static, relying on passive retrieval tasks that fail to simulate the complexities of agent-environment interaction, such as non...

Large Language Modelsautonomous agentsdynamic contextsAgentLongBenchLateral Thinking Puzzles+7 more
Jan 28, 202618

OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

Le Zhang, Yixiong Xiao, Xinjiang Lu +12 more

Graphical User Interface (GUI) agents show great potential for enabling foundation models to complete real-world tasks, revolutionizing human-computer interaction and improving human productivity. In this report, we present OmegaUse, a general-purpose GUI agent model for autonomous task execution on...

GUI agentsfoundation modelsautonomous task executiondata-construction pipelinedecoupled training paradigm+11 more
Jan 28, 20263

CUA-Skill: Develop Skills for Computer Using Agent

Tianyi Chen, Yinheng Li, Michael Solodko +12 more

Computer-Using Agents (CUAs) aim to autonomously operate computer systems to complete real-world tasks. However, existing agentic systems remain difficult to scale and lag behind human performance. A key limitation is the absence of reusable and structured skill abstractions that capture how humans ...

computer-using agentsskill baseexecution graphscomposition graphsdynamic skill retrieval+4 more
Jan 28, 202611

Self-Distillation Enables Continual Learning

Idan Shenfeld, Mehul Damani, Jonas Hübotter +1 more

Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement learning can reduce forgetting, it requires explicit reward functions that are often unavailable. Lear...

continual learningreinforcement learningsupervised fine-tuningon-policy learningoff-policy learning+5 more
Jan 27, 20265

DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal

Peixuan Han, Yingjie Yu, Jingjun Xu +1 more

Despite the growing adoption of large language models (LLMs) in scientific research workflows, automated support for academic rebuttal, a crucial step in academic communication and peer review, remains largely underexplored. Existing approaches typically rely on off-the-shelf LLMs or simple pipeline...

large language modelsacademic rebuttalagentic frameworklong-context understandingautomated reasoning+2 more
Jan 26, 20267

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Zhihan Liu, Lin Guan, Yixin Nie +6 more

Generalist LLM agents are often post-trained on a narrow set of environments but deployed across far broader, unseen domains. In this work, we investigate the challenge of agentic post-training when the eventual test domains are unknown. Specifically, we analyze which properties of reinforcement lea...

reinforcement learningout-of-domain performancestate information richnessplanning complexitygoal reachability+4 more
Jan 26, 20268

DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

Yinger Zhang, Shutong Jiang, Renhao Li +6 more

While agent evaluation has shifted toward long-horizon tasks, most benchmarks still emphasize local, step-level reasoning rather than the global constrained optimization (e.g., time and financial budgets) that demands genuine planning ability. Meanwhile, existing LLM planning benchmarks underreprese...

agent evaluationlong-horizon tasksglobal constrained optimizationlocal constrained reasoningagentic LLMs+2 more
Jan 26, 202619

daVinci-Dev: Agent-native Mid-training for Software Engineering

Ji Zeng, Dayuan Fu, Tiantian Mi +14 more

Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and test complex repositories. While post-training methods have become the de facto approach for code ag...

Large Language Modelagentic software engineeringmid-trainingdistribution mismatchagent-native data+5 more
Jan 26, 2026113

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

James Burgess, Jan N. Hansen, Duo Peng +5 more

Search agents are language models (LMs) that reason and search knowledge bases (or the web) to answer questions; recent methods supervise only the final answer accuracy using reinforcement learning with verifiable rewards (RLVR). Most RLVR search agents tackle general-domain QA, which limits their r...

language modelsreinforcement learning with verifiable rewardssearch agentsknowledge basesscientific papers+5 more
Jan 26, 202614

SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback

Fangyuan Xu, Rujun Han, Yanfei Chen +7 more

Deep search agents, which aim to answer complex questions requiring reasoning across multiple documents, can significantly speed up the information-seeking process. Collecting human annotations for this application is prohibitively expensive due to long and complex exploration trajectories. We propo...

deep search agentsquestion-answer pairsreasoning strategiesintrinsic evaluationextrinsic evaluation+3 more
Jan 26, 20266

Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks

Haotian Li, Shijun Yang, Weizhen Qi +5 more

Conventional agent systems often struggle in open-ended environments where task distributions continuously drift and external supervision is scarce. Their reliance on static toolsets or offline training lags behind these dynamics, leaving the system's capability boundaries rigid and unknown. To addr...

agent systemsopen-ended environmentstask distributionstool evolutionself-evolving paradigm+7 more
Jan 26, 20265

SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents

Yuhang Wang, Yuling Shi, Mo Yang +7 more

LLM agents have demonstrated remarkable capabilities in software development, but their performance is hampered by long interaction contexts, which incur high API costs and latency. While various context compression approaches such as LongLLMLingua have emerged to tackle this challenge, they typical...

context compressionLongLLMLinguaPPLcode understandingtask-aware adaptive pruning+4 more
Jan 23, 202671

LongCat-Flash-Thinking-2601 Technical Report

Meituan LongCat Team, Anchun Gui, Bei Li +159 more

We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves state-of-the-art performance among open-source models on a wide range of agentic benchmarks, includi...

Mixture-of-Expertsagentic reasoningdomain-parallel expert trainingfusionasynchronous reinforcement learning+8 more
Jan 23, 2026149

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Zhitao He, Zongwei Lyu, Yi R Fung

Although artificial intelligence (AI) has become deeply integrated into various stages of the research workflow and achieved remarkable advancements, academic rebuttal remains a significant and underexplored challenge. This is because rebuttal is a complex process of strategic communication under se...

Theory of MindToM-Strategy-Response pipelinesupervised fine-tuningreinforcement learningself-reward mechanism+2 more
Jan 22, 202613

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

Yuxuan Wan, Tianqing Fang, Zaitang Li +5 more

Recent advances in Deep Research Agents (DRAs) are transforming automated knowledge discovery and problem-solving. While the majority of existing efforts focus on enhancing policy capabilities via post-training, we propose an alternative paradigm: self-evolving the agent's ability by iteratively ver...

Deep Research Agentspolicy capabilitiespost-traininginference-time scalingverification+7 more
Jan 22, 202616

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors

Zhiwei Zhang, Fei Zhao, Rui Wang +6 more

Large language models (LLMs) can call tools effectively, yet they remain brittle in multi-turn execution: following a tool call error, smaller models often degenerate into repetitive invalid re-invocations, failing to interpret error feedback and self-correct. This brittleness hinders reliable real-...

large language modelstool callingreinforcement learningerror recoveryFission-GRPO+5 more
Jan 22, 20263
PreviousPage 6 of 8Next
Latest AI Agents Research | AI Agents Papers