Latest AI Agents Research Papers

Research on autonomous AI agents, tool use, planning, and systems that can take actions to accomplish goals.

154 Papers

Showing 20 of 20 papers

AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions

Xianyang Liu, Shangding Gu, Dawn Song

Large language model (LLM)-based agents are increasingly expected to negotiate, coordinate, and transact autonomously, yet existing benchmarks lack principled settings for evaluating language-mediated economic interaction among multiple agents. We introduce AgenticPay, a benchmark and simulation fra...

multi-agent negotiationlanguage-mediated interactionstrategic reasoningeconomic interactionbuyer-seller markets+3 more

Feb 5, 20263

ASA: Training-Free Representation Engineering for Tool-Calling Agents

Youjin Wang, Run Zhou, Rong Fu +6 more

Adapting LLM agents to domain-specific tool calling remains notably brittle under evolving interfaces. Prompt and schema engineering is easy to deploy but often fragile under distribution shift and strict parsers, while continual parameter-efficient fine-tuning improves reliability at the cost of tr...

tool callingactivation steering adaptermid-layer activationsrepresentation-behavior gapsteering vectors+6 more

Feb 4, 202623

Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration

Jiaheng Liu, Yuanxing Zhang, Shihao Li +1 more

For the past decade, the trajectory of generative artificial intelligence (AI) has been dominated by a model-centric paradigm driven by scaling laws. Despite significant leaps in visual fidelity, this approach has encountered a ``usability ceiling'' manifested as the Intent-Execution Gap (i.e., the ...

Vibe CodingVibe AIGCagentic orchestrationIntent-Execution Gapmulti-agent workflows+3 more

Feb 4, 202617

Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing

Zhaotian Weng, Antonis Antoniades, Deepak Nathani +3 more

Open-ended self-improving agents can autonomously modify their own structural designs to advance their capabilities and overcome the limits of pre-defined architectures, thus reducing reliance on human intervention. We introduce Group-Evolving Agents (GEA), a new paradigm for open-ended self-improve...

open-ended self-improving agentsstructural design modificationevolutionary unitexperience sharingexploratory diversity+6 more

Feb 4, 20263

Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning

Yansong Ning, Jun Fang, Naiqiang Tan +1 more

Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire interaction trajectories equally, overlooking the thought necessity and observation utility varies across turns. To ...

LLM agentsmulti-turn interactionsthought necessityobservation utilitycold-start data+5 more

Feb 4, 202611

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Zelai Xu, Zhexuan Xu, Ruize Zhang +7 more

Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability. In t...

Large Language Modelsmulti-agent systemsmulti-agent reinforcement learninglead-agent-subagent frameworkparallel execution+3 more

Feb 4, 202660

Learning to Repair Lean Proofs from Compiler Feedback

Evan Wang, Simon Chess, Daniel Lee +3 more

As neural theorem provers become increasingly agentic, the ability to interpret and act on compiler feedback is critical. However, existing Lean datasets consist almost exclusively of correct proofs, offering little supervision for understanding and repairing failures. We study Lean proof repair as ...

neural theorem proversLeanproof repaircompiler feedbacksupervised learning+3 more

Feb 3, 202624

A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces

Mingxuan Du, Benfeng Xu, Chiwei Zhu +4 more

Frontier language models have demonstrated strong reasoning and long-horizon tool-use capabilities. However, existing RAG systems fail to leverage these capabilities. They still rely on two paradigms: (1) designing an algorithm that retrieves passages in a single shot and concatenates them into the ...

RAG systemsretrieval-augmented generationagentic frameworkshierarchical retrieval interfaceskeyword search+5 more

Feb 3, 202616

DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents

Jiahao Zhao, Shaoxuan Xu, Zhongxiang Sun +6 more

Recently, Diffusion Large Language Models (dLLMs) have demonstrated unique efficiency advantages, enabled by their inherently parallel decoding mechanism and flexible generation paradigm. Meanwhile, despite the rapid advancement of Search Agents, their practical deployment is constrained by a fundam...

Diffusion Large Language ModelsReAct agent paradigmLatency ChallengeAgentic Supervised Fine-TuningAgentic Variance-Reduced Preference Optimization+2 more

Feb 3, 202625

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

Zimu Lu, Houxing Ren, Yunqiao Yang +4 more

Assisting non-expert users to develop complex interactive websites has become a popular task for LLM-powered code agents. However, existing code agents tend to only generate frontend web pages, masking the lack of real full-stack data processing and storage with fancy visual effects. Notably, constr...

multi-agent frameworkcode editingcodebase navigationbug localizationdata-scaling+7 more

Feb 3, 20268

AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration

Jianhao Ruan, Zhihao Xu, Yiran Peng +8 more

Language agents have shown strong promise for task automation. Realizing this promise for increasingly complex, long-horizon tasks has driven the rise of a sub-agent-as-tools paradigm for multi-turn task solving. However, existing designs still lack a dynamic abstraction view of sub-agents, thereby ...

language agentssub-agent-as-tools paradigmmulti-turn task solvingagent abstractiontask automation+7 more

Feb 3, 202647

SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training

Huatong Song, Lisheng Huang, Shuang Sun +10 more

In this technical report, we present SWE-Master, an open-source and fully reproducible post-training framework for building effective software engineering agents. SWE-Master systematically explores the complete agent development pipeline, including teacher-trajectory synthesis and data curation, lon...

post-training frameworkteacher-trajectory synthesisdata curationlong-horizon SFTRL with real execution feedback+5 more

Feb 3, 202627

Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Changze Lv, Jie Zhou, Wentao Zhao +10 more

Nowadays, training and evaluating DeepResearch-generated reports remain challenging due to the lack of verifiable reward signals. Accordingly, rubric-based evaluation has become a common practice. However, existing approaches either rely on coarse, pre-defined rubrics that lack sufficient granularit...

reinforcement learninghybrid rewardhuman preference supervisionLLM-based rubric evaluationMulti-agent Markov-state+3 more

Feb 3, 202620

SWE-World: Building Software Engineering Agents in Docker-Free Environments

Shuang Sun, Huatong Song, Lisheng Huang +11 more

Recent advances in large language models (LLMs) have enabled software engineering agents to tackle complex code modification tasks. Most existing approaches rely on execution feedback from containerized environments, which require dependency-complete setup and physical execution of programs and test...

large language modelssoftware engineering agentsDocker-free frameworksurrogate modelsagent-environment interaction+3 more

Feb 3, 202629

LatentMem: Customizing Latent Memory for Multi-Agent Systems

Muxin Fu, Guibin Zhang, Xiangyuan Xue +6 more

Large language model (LLM)-powered multi-agent systems (MAS) demonstrate remarkable collective intelligence, wherein multi-agent memory serves as a pivotal mechanism for continual adaptation. However, existing multi-agent memory designs remain constrained by two fundamental bottlenecks: (i) memory h...

multi-agent systemsmulti-agent memorylatent memoryexperience bankmemory composer+4 more

Feb 3, 202612

Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration

Bowei He, Minda Hu, Zenan Xu +7 more

Search-integrated reasoning enables language agents to transcend static parametric knowledge by actively querying external sources. However, training these agents via reinforcement learning is hindered by the multi-scale credit assignment problem: existing methods typically rely on sparse, trajector...

reinforcement learningsearch-integrated reasoningActor-Refiner collaborationmulti-scale credit assignmenttrajectory-level rewards+6 more

Feb 3, 20264

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

Guangyi Liu, Pengxiang Zhao, Yaozhen Liang +12 more

Current mobile GUI agent benchmarks systematically fail to assess memory capabilities, with only 5.2-11.8% memory-related tasks and no cross-session learning evaluation. We introduce MemGUI-Bench, a comprehensive memory-centric benchmark with pass@k and staged LLM-as-judge evaluation. Our contributi...

memory-centric benchmarkLLM-as-judge evaluationmemory taxonomycross-temporal retentioncross-spatial retention+6 more

Feb 3, 202612

Scaling Small Agents Through Strategy Auctions

Lisa Alazraki, William F. Shen, Yoram Bachrach +1 more

Small language models are increasingly viewed as a promising, cost-effective approach to agentic AI, with proponents claiming they are sufficiently capable for agentic workflows. However, while smaller agents can closely match larger ones on simple tasks, it remains unclear how their performance sca...

small language modelsagentic AIdeep searchcoding tasksStrategy Auctions for Workload Efficiency+10 more

Feb 2, 20263

A2Eval: Agentic and Automated Evaluation for Embodied Brain

Shuai Zhang, Jiayu Hu, Zijie Chen +9 more

Current embodied VLM evaluation relies on static, expert-defined, manually annotated benchmarks that exhibit severe redundancy and coverage imbalance. This labor intensive paradigm drains computational and annotation resources, inflates costs, and distorts model rankings, ultimately stifling iterati...

embodied VLMbenchmark curationevaluation suitecomputational costsranking bias+3 more

Feb 2, 20267

FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights

Zhen Wang, Fan Bai, Zhongyan Luo +9 more

Autonomous agents powered by large language models (LLMs) promise to accelerate scientific discovery end-to-end, but rigorously evaluating their capacity for verifiable discovery remains a central challenge. Existing benchmarks face a trade-off: they either heavily rely on LLM-as-judge evaluations o...

large language modelsautonomous agentsscientific discoveryresearch outputsmachine learning research+7 more

Feb 2, 20265

PreviousPage 4 of 8Next

View all categories