Latest AI Agents Research Papers

Research on autonomous AI agents, tool use, planning, and systems that can take actions to accomplish goals.

154 Papers
Showing 20 of 20 papers

Discovering Multiagent Learning Algorithms with Large Language Models

Zun Li, John Schultz, Daniel Hennes +1 more

Much of the advancement of Multi-Agent Reinforcement Learning (MARL) in imperfect-information games has historically depended on manual iterative refinement of baselines. While foundational families like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO) rest on solid ...

Multi-Agent Reinforcement Learningimperfect-information gamesCounterfactual Regret MinimizationPolicy Space Response Oraclesevolutionary coding+10 more
Feb 18, 202611

"What Are You Doing?": Effects of Intermediate Feedback from Agentic LLM In-Car Assistants During Multi-Step Processing

Johannes Kirmayr, Raphael Wennmacher, Khanh Huynh +3 more

Agentic AI assistants that autonomously perform multi-step tasks raise open questions for user experience: how should such systems communicate progress and reasoning during extended operations, especially in attention-critical contexts such as driving? We investigate feedback timing and verbosity fr...

agentic AI assistantsmulti-step tasksuser experiencefeedback timingverbosity+7 more
Feb 17, 202613

AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines

Yifan Wu, Yiran Peng, Yiyu Chen +12 more

The performance of autonomous Web GUI agents heavily relies on the quality and quantity of their training data. However, a fundamental bottleneck persists: collecting interaction trajectories from real-world websites is expensive and difficult to verify. The underlying state transitions are hidden, ...

Finite State Machinescoding agentsweb environmentsautomated search-and-verify pipelinesynthetic data+3 more
Feb 15, 202643

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

Zheng Chu, Xiao Wang, Jack Hong +11 more

Large language models are transitioning from generalpurpose knowledge engines to realworld problem solvers, yet optimizing them for deep search tasks remains challenging. The central bottleneck lies in the extreme sparsity of highquality search trajectories and reward signals, arising from the diffi...

task synthesisdual-constrained optimizationgraph topologyevidence dispersiontool-augmented queries+9 more
Feb 15, 202610

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

Haiyang Xu, Xi Zhang, Haowei Liu +16 more

The paper introduces GUI-Owl-1.5, the latest native GUI agent model that features instruct/thinking variants in multiple sizes (2B/4B/8B/32B/235B) and supports a range of platforms (desktop, mobile, browser, and more) to enable cloud-edge collaboration and real-time interaction. GUI-Owl-1.5 achieves...

GUI agent modelUI understandingtrajectory generationsimulated environmentscloud-based sandbox environments+5 more
Feb 15, 202638

Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

Ming Li, Xirui Li, Tianyi Zhou

As large language model agents increasingly populate networked environments, a fundamental question arises: do artificial intelligence (AI) agent societies undergo convergence dynamics similar to human social systems? Lately, Moltbook approximates a plausible future scenario in which autonomous agen...

large language model agentsnetworked environmentsAI agent societiessemantic stabilizationlexical turnover+5 more
Feb 15, 202624

Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

Chen Yang, Guangyue Peng, Jiaying Zhu +12 more

We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a s...

unified generalist language modelreward modelingreinforcement learningtool-call turnsdeep search+9 more
Feb 13, 202618

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Xiangyi Li, Wenbo Chen, Yimin Liu +37 more

Agent Skills are structured packages of procedural knowledge that augment LLM agents at inference time. Despite rapid adoption, there is no standard way to measure whether they actually help. We present SkillsBench, a benchmark of 86 tasks across 11 domains paired with curated Skills and determinist...

agent skillsLLM agentsSkillsBenchprocedural knowledgecurated Skills+4 more
Feb 13, 202645

Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use

Hanbing Liu, Chunhao Tian, Nan An +4 more

We study budget-constrained tool-augmented agents, where a large language model must solve multi-step tasks by invoking external tools under a strict monetary budget. We formalize this setting as sequential decision making in context space with priced and stochastic tool executions, making direct pl...

tool-augmented agentslarge language modelsequential decision makingcontext spacestochastic tool executions+3 more
Feb 12, 20263

Learning to Configure Agentic AI Systems

Aditya Taparia, Som Sagar, Ransalu Senanayake

Configuring LLM-based agent systems involves choosing workflows, tools, token budgets, and prompts from a large combinatorial design space, and is typically handled today by fixed large templates or hand-tuned heuristics. This leads to brittle behavior and unnecessary compute, since the same cumbers...

LLM-based agent systemsreinforcement learninghierarchical policyquery-wise decision problemagent configuration+6 more
Feb 12, 202614

Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments

Romain Froger, Pierre Andrews, Matteo Bettini +21 more

We introduce Gaia2, a benchmark for evaluating large language model agents in realistic, asynchronous environments. Unlike prior static or synchronous evaluations, Gaia2 introduces scenarios where environments evolve independently of agent actions, requiring agents to operate under temporal constrai...

large language model agentsasynchronous environmentstemporal constraintsmulti-agent collaborationwrite-action verifier+4 more
Feb 12, 20268

LawThinker: A Deep Research Legal Agent in Dynamic Environments

Xinyu Yang, Chenlong Deng, Tongyu Wen +2 more

Legal reasoning requires not only correct outcomes but also procedurally compliant reasoning processes. However, existing methods lack mechanisms to verify intermediate reasoning steps, allowing errors such as inapplicable statute citations to propagate undetected through the reasoning chain. To add...

legal reasoningautonomous agentExplore-Verify-Memorize strategyDeepVerifier moduleknowledge accuracy+4 more
Feb 12, 202631

CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion

Yusong Lin, Haiyang Wang, Shuzhe Wu +4 more

Agentic coding requires agents to effectively interact with runtime environments, e.g., command line interfaces (CLI), so as to complete tasks like resolving dependency issues, fixing system problems, etc. But it remains underexplored how such environment-intensive tasks can be obtained at scale to ...

CLI-GymLiberCoderenvironment-intensive tasksDockerfileagentic task+2 more
Feb 11, 20268

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Ailin Huang, Ang Li, Aobo Kong +212 more

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with ...

Mixture-of-Expertssparse MoEfoundation modelactive parametersinterleaved attention+13 more
Feb 11, 2026140

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

Qixing Zhou, Jiacheng Zhang, Haiyang Wang +9 more

Agents powered by large language models (LLMs) are increasingly adopted in the software industry, contributing code as collaborators or even autonomous developers. As their presence grows, it becomes important to assess the current boundaries of their coding abilities. Existing agentic coding benchm...

large language modelsagentic codingsoftware developmentfeature-oriented developmentexecution-based evaluation+6 more
Feb 11, 202614

GameDevBench: Evaluating Agentic Capabilities Through Game Development

Wayne Chi, Yixiong Fang, Arnav Yayavaram +8 more

Despite rapid progress on coding agents, progress on their multimodal counterparts has lagged behind. A key challenge is the scarcity of evaluation testbeds that combine the complexity of software development with the need for deep multimodal understanding. Game development provides such a testbed a...

coding agentsmultimodal counterpartsevaluation testbedsoftware developmentmultimodal understanding+6 more
Feb 11, 202614

TreeCUA: Efficiently Scaling GUI Automation with Tree-Structured Verifiable Evolution

Deyang Jiang, Jing Huang, Xuanle Zhao +6 more

Effectively scaling GUI automation is essential for computer-use agents (CUAs); however, existing work primarily focuses on scaling GUI grounding rather than the more crucial GUI planning, which requires more sophisticated data collection. In reality, the exploration process of a CUA across apps/des...

GUI automationGUI planningtree-structured trajectoriesmulti-agent collaborative frameworkadaptive exploration algorithm+3 more
Feb 10, 20264
PreviousPage 2 of 8Next