Large Language Models

Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion

LLinrui MaYYufei CuiKKai HanYYunhe Wang
Published
January 20, 2026
Authors
4
Word Count
5,229
Code
Includes code

Speedy text generation with enhanced global coherence.

Abstract

One of the most compelling features of global discrete diffusion language models is their global bidirectional contextual capability. However, existing block-based diffusion studies tend to introduce autoregressive priors, which, while offering benefits, can cause models to lose this global coherence at the macro level. To regain global contextual understanding while preserving the advantages of the semi-autoregressive paradigm, we propose Diffusion in Diffusion, a 'draft-then-refine' framework designed to overcome the irreversibility and myopia problems inherent in block diffusion models. Our approach first employs block diffusion to generate rapid drafts using small blocks, then refines these drafts through global bidirectional diffusion with a larger bidirectional receptive field. We utilize snapshot confidence remasking to identify the most critical tokens that require modification, and apply mix-scale training to expand the block diffusion model's global capabilities. Empirical results demonstrate that our approach sets a new benchmark for discrete diffusion models on the OpenWebText dataset. Using only 26% of the fine-tuning budget of baseline models, we reduce generative perplexity from 25.7 to 21.9, significantly narrowing the performance gap with autoregressive models.

Key Takeaways

  • 1

    Combines speed of autoregressive models with global coherence.

  • 2

    Introduces 'Draft-then-Revise' paradigm for better text generation.

  • 3

    Significantly reduces perplexity with less fine-tuning budget.

Limitations

  • Relies on larger blocks for global context.

  • Requires more computational power during refinement.

Keywords

discrete diffusion language modelsglobal bidirectional contextual capabilityblock-based diffusionautoregressive priorssemi-autoregressive paradigmdiffusion in diffusiondraft-then-refine frameworkirreversibilitymyopiasnapshot confidence remaskingmix-scale traininggenerative perplexity

More in Large Language Models

View all
Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion | Paperchime