Large Language Models

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

BBowen XuSShaoyu WuHHao JiangKKai LiuXXin ChenLLulu HuBBin Yang
Published
February 2, 2026
Authors
7
Word Count
13,722
Code
Includes code

D-CORE boosts model efficiency via task decomposition.

Abstract

Effective tool use and reasoning are essential capabilities for large reasoning models~(LRMs) to address complex real-world problems. Through empirical analysis, we identify that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to Lazy Reasoning. To address this, we propose a two-stage training framework D-CORE~(\textbf{D}ecomposing tasks and \textbf{Co}mposing \textbf{Re}asoning processes) that first incentivize the LRMs' task decomposition reasoning capability via self-distillation, followed by diversity-aware reinforcement learning~(RL) to restore LRMs' reflective reasoning capability. D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales. Experiments on BFCLv3 demonstrate superiority of our method: D-CORE-8B reaches 77.7\% accuracy, surpassing the best-performing 8B model by 5.7\%. Meanwhile, D-CORE-14B establishes a new state-of-the-art at 79.3\%, outperforming 70B models despite being 5times smaller. The source code is available at https://github.com/alibaba/EfficientAI.

Key Takeaways

  • 1

    D-CORE enhances task decomposition in large reasoning models.

  • 2

    Two-stage training process improves reflective reasoning.

  • 3

    Effective tool use is crucial for real-world applications.

Limitations

  • Requires initial reference trajectories and few-shot examples.

  • Potential challenges in scaling to extremely complex tasks.

Keywords

large reasoning modelssub-task decompositionLazy Reasoningself-distillationdiversity-aware reinforcement learningBFCLv3

More in Large Language Models

View all
D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use | Paperchime