Test-Time Training with KV Binding Is Secretly Linear Attention
Junchen Liu, Sven Elflein, Or Litany +2 more
Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis reveals multiple phenomena that contradict this memorization-based interpretation. Motivated by these f...