Generative AI

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

HHongzhou ZhuMMin ZhaoGGuande HeHHang SuCChongxuan LiJJun Zhu
Published
February 2, 2026
Authors
6
Word Count
11,324

Revolutionizing real-time interactive video generation with Causal Forcing.

Abstract

To achieve real-time interactive video generation, current methods distill pretrained bidirectional video diffusion models into few-step autoregressive (AR) models, facing an architectural gap when full attention is replaced by causal attention. However, existing approaches do not bridge this gap theoretically. They initialize the AR student via ODE distillation, which requires frame-level injectivity, where each noisy frame must map to a unique clean frame under the PF-ODE of an AR teacher. Distilling an AR student from a bidirectional teacher violates this condition, preventing recovery of the teacher's flow map and instead inducing a conditional-expectation solution, which degrades performance. To address this issue, we propose Causal Forcing that uses an AR teacher for ODE initialization, thereby bridging the architectural gap. Empirical results show that our method outperforms all baselines across all metrics, surpassing the SOTA Self Forcing by 19.3\% in Dynamic Degree, 8.7\% in VisionReward, and 16.7\% in Instruction Following. Project page and the code: https://thu-ml.github.io/CausalForcing.github.io/{https://thu-ml.github.io/CausalForcing.github.io/}

Key Takeaways

  • 1

    Causal Forcing enhances real-time interactive video generation.

  • 2

    Outperforms existing methods in dynamic degree and visual fidelity.

  • 3

    Maintains high throughput suitable for real-time applications.

Limitations

  • Relies on high-quality synthetic training data.

  • Substantial computational requirements for training.

Keywords

video diffusion modelsautoregressive modelscausal attentionbidirectional attentionODE distillationframe-level injectivityPF-ODEconditional-expectation solutionSelf Forcing

More in Generative AI

View all
Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation | Paperchime