LLaDA2.1: Speeding Up Text Diffusion via Token Editing

TTiwei BieMMaosong CaoXXiang CaoBBingsen ChenFFuyuan ChenKKun ChenLLun DuDDaozhuo FengHHaibo FengMMingliang GongZZhuocheng GongYYanmei GuJJian GuanKKaiyuan GuanHHongliang HeZZenan HuangJJuyong JiangZZhonghui JiangZZhenzhong LanCChengxi LiJJianguo LiZZehuan LiHHuabin LiuLLin LiuGGuoshan LuYYuan LuYYuxin MaXXingyu MouZZhenxuan PanKKaida QiuYYuji RenJJianfeng TanYYiding TianZZian WangLLanning WeiTTao WuYYipeng XingWWentao YeLLiangyu ZhaTTianze ZhangXXiaolu ZhangJJunbo ZhaoDDa ZhengHHao ZhongWWanli ZhongJJun ZhouJJunlin ZhouLLiwang ZhuMMuzhi ZhuYYihong Zhuang

Published: February 9, 2026
Authors: 50

View on arXiv Download PDF

Abstract

While LLaDA2.0 showcased the scaling potential of 100B-level block-diffusion models and their inherent parallelization, the delicate equilibrium between decoding speed and generation quality has remained an elusive frontier. Today, we unveil LLaDA2.1, a paradigm shift designed to transcend this trade-off. By seamlessly weaving Token-to-Token (T2T) editing into the conventional Mask-to-Token (M2T) scheme, we introduce a joint, configurable threshold-decoding scheme. This structural innovation gives rise to two distinct personas: the Speedy Mode (S Mode), which audaciously lowers the M2T threshold to bypass traditional constraints while relying on T2T to refine the output; and the Quality Mode (Q Mode), which leans into conservative thresholds to secure superior benchmark performances with manageable efficiency degrade. Furthering this evolution, underpinned by an expansive context window, we implement the first large-scale Reinforcement Learning (RL) framework specifically tailored for dLLMs, anchored by specialized techniques for stable gradient estimation. This alignment not only sharpens reasoning precision but also elevates instruction-following fidelity, bridging the chasm between diffusion dynamics and complex human intent. We culminate this work by releasing LLaDA2.1-Mini (16B) and LLaDA2.1-Flash (100B). Across 33 rigorous benchmarks, LLaDA2.1 delivers strong task performance and lightning-fast decoding speed. Despite its 100B volume, on coding tasks it attains an astounding 892 TPS on HumanEval+, 801 TPS on BigCodeBench, and 663 TPS on LiveCodeBench.

Keywords

block-diffusion modelsdecoding speedgeneration qualityToken-to-Token editingMask-to-Token schemethreshold-decoding schemeSpeedy ModeQuality ModeReinforcement Learninggradient estimationreasoning precisioninstruction-followinglarge language diffusion modelsHumanEval+BigCodeBenchLiveCodeBench

More in Large Language Models

View all

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Shaobo Wang, Xuan Ouyang +10

As high-quality public text approaches exhaustion, a phenomenon known as the Data Wall, pre-training is shifting from more tokens to better tokens. However, existing methods either rely on heuristic s...

Feb 5260

Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

Zhongzhi Li, Xuansheng Wu +3

The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity usi...

Feb 11204

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Zixuan Huang, Xin Xia +12

Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in ...

Feb 9170

Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives

Tengyue Xu, Zhuoyang Qian +17

Autonomous scientific discovery with large language model (LLM)-based agents has recently made substantial progress, demonstrating the ability to automate end-to-end research workflows. However, exist...

Jan 28143

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Zhiyuan Hu, Yucheng Wang +8

Reinforcement learning (RL) has become a central paradigm for post-training large language models (LLMs), particularly for complex reasoning tasks, yet it often suffers from exploration collapse: poli...

Jan 13129

More Large Language Models papers