HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing
Yizhao Gao, Jianyu Wei, Qihao Zhang +11 more
This work introduces Hybrid Sparse Attention (HySparse), a new architecture that interleaves each full attention layer with several sparse attention layers. While conceptually simple, HySparse strategically derives each sparse layer's token selection and KV caches directly from the preceding full at...