Efficient AI

PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers

HHaopeng LiSShitong ShaoWWenliang ZhongZZikai ZhouLLichen BaiHHui XiongZZeke Xie
Published
February 1, 2026
Authors
7
Word Count
8,217
Code
Includes code

PISA enhances diffusion transformer efficiency without quality loss.

Abstract

Diffusion Transformers are fundamental for video and image generation, but their efficiency is bottlenecked by the quadratic complexity of attention. While block sparse attention accelerates computation by attending only critical key-value blocks, it suffers from degradation at high sparsity by discarding context. In this work, we discover that attention scores of non-critical blocks exhibit distributional stability, allowing them to be approximated accurately and efficiently rather than discarded, which is essentially important for sparse attention design. Motivated by this key insight, we propose PISA, a training-free Piecewise Sparse Attention that covers the full attention span with sub-quadratic complexity. Unlike the conventional keep-or-drop paradigm that directly drop the non-critical block information, PISA introduces a novel exact-or-approximate strategy: it maintains exact computation for critical blocks while efficiently approximating the remainder through block-wise Taylor expansion. This design allows PISA to serve as a faithful proxy to full attention, effectively bridging the gap between speed and quality. Experimental results demonstrate that PISA achieves 1.91 times and 2.57 times speedups on Wan2.1-14B and Hunyuan-Video, respectively, while consistently maintaining the highest quality among sparse attention methods. Notably, even for image generation on FLUX, PISA achieves a 1.2 times acceleration without compromising visual quality. Code is available at: https://github.com/xie-lab-ml/piecewise-sparse-attention.

Key Takeaways

  • 1

    PISA achieves speedups while maintaining high-quality outputs.

  • 2

    PISA uses piecewise sparse attention for efficient computation.

  • 3

    PISA preserves intrinsic attention distribution effectively.

Limitations

  • Assumes distributional stability of non-critical attention blocks.

  • Requires powerful hardware for large-scale applications.

Keywords

diffusion transformersattentionblock sparse attentionsparse attentionpiecewise sparse attentionblock-wise Taylor expansionattention scoresquadratic complexitycomputational efficiencyvisual quality

More in Efficient AI

View all
PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers | Paperchime