AI Agents

AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research

YYishan LiWWentong ChenYYukun YanMMingwei LiSSen MeiXXiaorong WangKKunpeng LiuXXin CongSShuo WangZZhong ZhangYYaxi LuZZhenghao LiuYYankai LinZZhiyuan LiuMMaosong Sun
Published
February 6, 2026
Authors
15
Word Count
8,579
Code
Includes code

AI research reports improve through interleaved writing and deepening instead of rigid upfront planning.

Abstract

Generating deep research reports requires large-scale information acquisition and the synthesis of insight-driven analysis, posing a significant challenge for current language models. Most existing approaches follow a plan-then-write paradigm, whose performance heavily depends on the quality of the initial outline. However, constructing a comprehensive outline itself demands strong reasoning ability, causing current deep research systems to rely almost exclusively on closed-source or online large models. This reliance raises practical barriers to deployment and introduces safety and privacy concerns for user-authored data. In this work, we present AgentCPM-Report, a lightweight yet high-performing local solution composed of a framework that mirrors the human writing process and an 8B-parameter deep research agent. Our framework uses a Writing As Reasoning Policy (WARP), which enables models to dynamically revise outlines during report generation. Under this policy, the agent alternates between Evidence-Based Drafting and Reasoning-Driven Deepening, jointly supporting information acquisition, knowledge refinement, and iterative outline evolution. To effectively equip small models with this capability, we introduce a Multi-Stage Agentic Training strategy, consisting of cold-start, atomic skill RL, and holistic pipeline RL. Experiments on DeepResearch Bench, DeepConsult, and DeepResearch Gym demonstrate that AgentCPM-Report outperforms leading closed-source systems, with substantial gains in Insight.

Key Takeaways

  • 1

    Writing and reasoning are intertwined; AI systems should interleave drafting with deepening rather than planning rigidly upfront.

  • 2

    AgentCPM-Report uses sparse initial outlines and dynamically refines them based on discovered gaps during the writing process.

  • 3

    The approach enables local deployment with smaller models by eliminating the need for expensive outline generation steps.

Limitations

  • Plan-then-write systems create insight ceilings by locking structure before evidence discovery occurs during writing.

  • Current deep research systems require massive proprietary models and cloud deployment, raising privacy and security concerns.

Keywords

Writing As Reasoning PolicyWARPEvidence-Based DraftingReasoning-Driven DeepeningMulti-Stage Agentic Trainingcold-startatomic skill RLholistic pipeline RLdeep research agentinsight-driven analysisplan-then-write paradigm

More in AI Agents

View all
AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research | Paperchime