Latest Large Language Models Research Papers

Research on large language models including GPT, Claude, Llama, and other transformer-based architectures for natural language understanding and generation.

189 Papers
Showing 20 of 20 papers

HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing

Yizhao Gao, Jianyu Wei, Qihao Zhang +11 more

This work introduces Hybrid Sparse Attention (HySparse), a new architecture that interleaves each full attention layer with several sparse attention layers. While conceptually simple, HySparse strategically derives each sparse layer's token selection and KV caches directly from the preceding full at...

Hybrid Sparse Attentionsparse attention layersfull attention layertoken selectionKV caches+4 more
Feb 3, 202632

Context Compression via Explicit Information Transmission

Jiangnan Ye, Hanqi Yan, Zhenyi Shen +3 more

Long-context inference with Large Language Models (LLMs) is costly due to quadratic attention and growing key-value caches, motivating context compression. In this work, we study soft context compression, where a long context is condensed into a small set of continuous representations. Existing meth...

soft context compressionLLMsquadratic attentionkey-value cacheslayer-by-layer self-attention+6 more
Feb 3, 202613

No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding

Vynska Amalia Permadi, Xingwei Tan, Nafise Sadat Moosavi +1 more

Understanding culture requires reasoning across context, tradition, and implicit social knowledge, far beyond recalling isolated facts. Yet most culturally focused question answering (QA) benchmarks rely on single-hop questions, which may allow models to exploit shallow cues rather than demonstrate ...

question answeringlarge language modelscultural understandingmulti-hop reasoningdataset creation+3 more
Feb 3, 20268

Likelihood-Based Reward Designs for General LLM Reasoning

Ariel Kwiatkowski, Natasha Butt, Ismail Labiad +2 more

Fine-tuning large language models (LLMs) on reasoning benchmarks via reinforcement learning requires a specific reward function, often binary, for each benchmark. This comes with two potential limitations: the need to design the reward, and the potentially sparse nature of binary rewards. Here, we s...

large language modelsreinforcement learningreward functionbinary rewardslikelihood-based rewards+8 more
Feb 3, 20264

MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling

Ning Ding, Fangcheng Liu, Kyungrae Kim +4 more

Scaling Large Language Models (LLMs) typically relies on increasing the number of parameters or test-time computations to boost performance. However, these strategies are impractical for edge device deployment due to limited RAM and NPU resources. Despite hardware constraints, deploying performant L...

Transformer layertoken-level memory expertsre-parameterization strategyparameter matriceslookup table+2 more
Feb 3, 20269

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Ian Wu, Yuxiao Qu, Amrith Setlur +1 more

Large Language Models (LLMs) that can continually improve beyond their training budgets are able to solve increasingly difficult problems by adapting at test time, a property we refer to as extrapolation. However, standard reinforcement learning (RL) operates over fixed problem distributions and tra...

reinforcement learningautoregressive decodingreasoning chainstest-time performancescaffolding+2 more
Feb 3, 20265

FASA: Frequency-aware Sparse Attention

Yifei Wang, Yueqi Wang, Zhenrui Yue +6 more

The deployment of Large Language Models (LLMs) faces a critical bottleneck when handling lengthy inputs: the prohibitive memory footprint of the Key Value (KV) cache. To address this bottleneck, the token pruning paradigm leverages attention sparsity to selectively retain a small, critical subset of...

Large Language ModelsKey Value cachetoken pruningattention sparsityquery-dependent token importance+14 more
Feb 3, 202696

CL-bench: A Benchmark for Context Learning

Shihan Dou, Ming Zhang, Zhangyue Yin +24 more

Current language models (LMs) excel at reasoning over prompts using pre-trained knowledge. However, real-world tasks are far more complex and context-dependent: models must learn from task-specific context and leverage new knowledge beyond what is learned during pre-training to reason and resolve ta...

language modelscontext learningpre-trained knowledgereal-world tasksin-context learning+6 more
Feb 3, 202615

Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics

Ziwen Xu, Chenyan Wu, Hengyu Sun +9 more

Methods for controlling large language models (LLMs), including local weight fine-tuning, LoRA-based adaptation, and activation-based interventions, are often studied in isolation, obscuring their connections and making comparison difficult. In this work, we present a unified view that frames these ...

local weight fine-tuningLoRA-based adaptationactivation-based interventionsdynamic weight updatespreference-utility analysis+5 more
Feb 2, 20269

No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs

Liyan Xu, Mo Yu, Fandong Meng +1 more

This work stems from prior complementary observations on the dynamics of Chain-of-Thought (CoT): Large Language Models (LLMs) is shown latent planning of subsequent reasoning prior to CoT emergence, thereby diminishing the significance of explicit CoT; whereas CoT remains critical for tasks requirin...

Chain-of-ThoughtLarge Language Modelslatent planninghidden statesTele-Lens+3 more
Feb 2, 202654

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Bowen Xu, Shaoyu Wu, Hao Jiang +4 more

Effective tool use and reasoning are essential capabilities for large reasoning models~(LRMs) to address complex real-world problems. Through empirical analysis, we identify that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to Lazy Reasoning. To a...

large reasoning modelssub-task decompositionLazy Reasoningself-distillationdiversity-aware reinforcement learning+1 more
Feb 2, 202612

Breaking the Static Graph: Context-Aware Traversal for Robust Retrieval-Augmented Generation

Kwun Hang Lau, Fangyuan Zhang, Boyu Ruan +4 more

Recent advances in Retrieval-Augmented Generation (RAG) have shifted from simple vector similarity to structure-aware approaches like HippoRAG, which leverage Knowledge Graphs (KGs) and Personalized PageRank (PPR) to capture multi-hop dependencies. However, these methods suffer from a "Static Graph ...

Retrieval-Augmented GenerationKnowledge GraphsPersonalized PageRankrandom walkmulti-hop dependencies+4 more
Feb 2, 20264

MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration

Lianhai Ren, Yucheng Ding, Xiao Liu +3 more

Training instability remains a critical challenge in large language model (LLM) pretraining, often manifesting as sudden gradient explosions that waste significant computational resources. We study training failures in a 5M-parameter NanoGPT model scaled via μP, identifying two key phenomena precedi...

large language modelpretraininggradient explosionsweight matrix stable rankFrobenius norm+4 more
Feb 2, 202624

Context Learning for Multi-Agent Discussion

Xingyuan Hua, Sheng Yue, Xinyi Li +3 more

Multi-Agent Discussion (MAD) has garnered increasing attention very recently, where multiple LLM instances collaboratively solve problems via structured discussion. However, we find that current MAD methods easily suffer from discussion inconsistency, LLMs fail to reach a coherent solution, due to t...

multi-LLM context learningcontext generatordiscussion consistencycontext coherenceself-adaptive mechanism+2 more
Feb 2, 20263

CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation

Zhongyuan Peng, Caijun Xu, Changyi Xiao +4 more

Large Reasoning Models (LRMs) benefit substantially from training on challenging competition-level questions. However, existing automated question synthesis methods lack precise difficulty control, incur high computational costs, and struggle to generate competition-level questions at scale. In this...

Large Reasoning Modelstest-time scalingcontrollable difficulty controlCoDiQ-GeneratorCoDiQ-Corpus+3 more
Feb 2, 20264

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Xiao Liang, Zhong-Zhi Li, Zhenghao Lin +7 more

Large language models (LLMs) have demonstrated strong reasoning capabilities through step-by-step chain-of-thought (CoT) reasoning. Nevertheless, at the limits of model capability, CoT often proves insufficient, and its strictly sequential nature constrains test-time scalability. A potential alterna...

chain-of-thought reasoningdivide-and-conquer reasoningreinforcement learninglarge language modelspolicy decomposition+4 more
Feb 2, 20267

WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora

Pengyu Wang, Benfeng Xu, Licheng Zhang +4 more

Graph-based Retrieval-Augmented Generation (GraphRAG) organizes external knowledge as a hierarchical graph, enabling efficient retrieval and aggregation of scattered evidence across multiple documents. However, many existing benchmarks for GraphRAG rely on short, curated passages as external knowled...

GraphRAGretrieval-augmented generationhierarchical graphexternal knowledgelong contexts+6 more
Feb 2, 202638

From Directions to Regions: Decomposing Activations in Language Models via Local Geometry

Or Shafran, Shaked Ronen, Omri Fahn +3 more

Activation decomposition methods in language models are tightly coupled to geometric assumptions on how concepts are realized in activation space. Existing approaches search for individual global directions, implicitly assuming linear separability, which overlooks concepts with nonlinear or multi-di...

Mixture of Factor Analyzersactivation spaceGaussian regionslocal covariance structureconcept discovery+2 more
Feb 2, 20263

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Dylan Zhang, Yufeng Xu, Haojin Wang +2 more

Post-training of reasoning LLMs is a holistic process that typically consists of an offline SFT stage followed by an online reinforcement learning (RL) stage. However, SFT is often optimized in isolation to maximize SFT performance alone. We show that, after identical RL training, models initializ...

supervised fine-tuningreinforcement learningpolicy evaluationimportance samplingloss re-weighting+6 more
Feb 1, 202610

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Hyesung Jeon, Hyeongju Ha, Jae-Joon Kim

Role specialization in multi-LLM agent systems is often realized via multi-LoRA, where agents share a pretrained backbone and differ only through lightweight adapters. Despite sharing base model weights, each agent independently builds and stores its own KV cache for the same long, tool-augmented tr...

multi-LoRAKV cache sharingLoRA weightspretrained backboneFlash-LoRA-Attention+3 more
Feb 1, 20266
PreviousPage 5 of 10Next