AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research

YYishan LiWWentong ChenYYukun YanMMingwei LiSSen MeiXXiaorong WangKKunpeng LiuXXin CongSShuo WangZZhong ZhangYYaxi LuZZhenghao LiuYYankai LinZZhiyuan LiuMMaosong Sun

Published: February 6, 2026
Authors: 15
Word Count: 8,579
Code: Includes code

View on arXiv Download PDF

AI research reports improve through interleaved writing and deepening instead of rigid upfront planning.

Abstract

Generating deep research reports requires large-scale information acquisition and the synthesis of insight-driven analysis, posing a significant challenge for current language models. Most existing approaches follow a plan-then-write paradigm, whose performance heavily depends on the quality of the initial outline. However, constructing a comprehensive outline itself demands strong reasoning ability, causing current deep research systems to rely almost exclusively on closed-source or online large models. This reliance raises practical barriers to deployment and introduces safety and privacy concerns for user-authored data. In this work, we present AgentCPM-Report, a lightweight yet high-performing local solution composed of a framework that mirrors the human writing process and an 8B-parameter deep research agent. Our framework uses a Writing As Reasoning Policy (WARP), which enables models to dynamically revise outlines during report generation. Under this policy, the agent alternates between Evidence-Based Drafting and Reasoning-Driven Deepening, jointly supporting information acquisition, knowledge refinement, and iterative outline evolution. To effectively equip small models with this capability, we introduce a Multi-Stage Agentic Training strategy, consisting of cold-start, atomic skill RL, and holistic pipeline RL. Experiments on DeepResearch Bench, DeepConsult, and DeepResearch Gym demonstrate that AgentCPM-Report outperforms leading closed-source systems, with substantial gains in Insight.

Key Takeaways

1
Writing and reasoning are intertwined; AI systems should interleave drafting with deepening rather than planning rigidly upfront.
2
AgentCPM-Report uses sparse initial outlines and dynamically refines them based on discovered gaps during the writing process.
3
The approach enables local deployment with smaller models by eliminating the need for expensive outline generation steps.

Limitations

Plan-then-write systems create insight ceilings by locking structure before evidence discovery occurs during writing.
Current deep research systems require massive proprietary models and cloud deployment, raising privacy and security concerns.

Keywords

Writing As Reasoning PolicyWARPEvidence-Based DraftingReasoning-Driven DeepeningMulti-Stage Agentic Trainingcold-startatomic skill RLholistic pipeline RLdeep research agentinsight-driven analysisplan-then-write paradigm

More in AI Agents

View all

LongCat-Flash-Thinking-2601 Technical Report

Meituan LongCat Team, Anchun Gui +160

We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves ...

Jan 23149

Agentic Reasoning for Large Language Models

Tianxin Wei, Ting-Wei Li +27

Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world se...

Jan 18149

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Ailin Huang, Ang Li +213

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: ...

Feb 11140

UI-Venus-1.5 Technical Report

Veuns-Team, Changlong Gao +25

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In ...

Feb 9140

daVinci-Dev: Agent-native Mid-training for Software Engineering

Ji Zeng, Dayuan Fu +15

Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and ...

Jan 26113

More AI Agents papers