AI Agents

daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

MMohan JiangDDayuan FuJJunhao ShiJJi ZengWWeiye SiKKeyu LiXXuefeng LiYYang XiaoWWenjie LiDDequan WangPPengfei Liu
Published
February 2, 2026
Authors
11
Word Count
8,883
Code
Includes code

Efficient long-horizon AI training using GitHub PRs.

Abstract

While Large Language Models (LLMs) excel at short-term tasks, scaling them to long-horizon agentic workflows remains challenging. The core bottleneck lies in the scarcity of training data that captures authentic long-dependency structures and cross-stage evolutionary dynamics--existing synthesis methods either confine to single-feature scenarios constrained by model distribution, or incur prohibitive human annotation costs, failing to provide scalable, high-quality supervision. We address this by reconceptualizing data synthesis through the lens of real-world software evolution. Our key insight: Pull Request (PR) sequences naturally embody the supervision signals for long-horizon learning. They decompose complex objectives into verifiable submission units, maintain functional coherence across iterations, and encode authentic refinement patterns through bug-fix histories. Building on this, we propose daVinci-Agency, which systematically mines structured supervision from chain-of-PRs through three interlocking mechanisms: (1) progressive task decomposition via continuous commits, (2) long-term consistency enforcement through unified functional objectives, and (3) verifiable refinement from authentic bug-fix trajectories. Unlike synthetic trajectories that treat each step independently, daVinci-Agency's PR-grounded structure inherently preserves the causal dependencies and iterative refinements essential for teaching persistent goal-directed behavior and enables natural alignment with project-level, full-cycle task modeling. The resulting trajectories are substantial--averaging 85k tokens and 116 tool calls--yet remarkably data-efficient: fine-tuning GLM-4.6 on 239 daVinci-Agency samples yields broad improvements across benchmarks, notably achieving a 47% relative gain on Toolathlon. Beyond benchmark performance, our analysis confirms...

Key Takeaways

  • 1

    Leverages real-world PR sequences for long-horizon training.

  • 2

    Significant performance gains with minimal data.

  • 3

    Enhances token and tool efficiency in tasks.

Limitations

  • Limited to five PRs per task chain.

  • Dependent on the quality of PR data.

Keywords

Large Language Modelslong-horizon agentic workflowstraining datalong-dependency structurescross-stage evolutionary dynamicsdata synthesisPull Request sequencestask decompositionfunctional coherencebug-fix historiesdaVinci-Agencycontinuous commitsunified functional objectivesverifiable refinementcausal dependenciesiterative refinementsgoal-directed behaviorproject-level task modelingToolathlon

More in AI Agents

View all
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently | Paperchime