Latest Large Language Models Research Papers

Research on large language models including GPT, Claude, Llama, and other transformer-based architectures for natural language understanding and generation.

192 Papers
Showing 20 of 20 papers

Context Learning for Multi-Agent Discussion

Xingyuan Hua, Sheng Yue, Xinyi Li +3 more

Multi-Agent Discussion (MAD) has garnered increasing attention very recently, where multiple LLM instances collaboratively solve problems via structured discussion. However, we find that current MAD methods easily suffer from discussion inconsistency, LLMs fail to reach a coherent solution, due to t...

multi-LLM context learningcontext generatordiscussion consistencycontext coherenceself-adaptive mechanism+2 more
Feb 2, 20263

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Hyesung Jeon, Hyeongju Ha, Jae-Joon Kim

Role specialization in multi-LLM agent systems is often realized via multi-LoRA, where agents share a pretrained backbone and differ only through lightweight adapters. Despite sharing base model weights, each agent independently builds and stores its own KV cache for the same long, tool-augmented tr...

multi-LoRAKV cache sharingLoRA weightspretrained backboneFlash-LoRA-Attention+3 more
Feb 1, 20266

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Dylan Zhang, Yufeng Xu, Haojin Wang +2 more

Post-training of reasoning LLMs is a holistic process that typically consists of an offline SFT stage followed by an online reinforcement learning (RL) stage. However, SFT is often optimized in isolation to maximize SFT performance alone. We show that, after identical RL training, models initializ...

supervised fine-tuningreinforcement learningpolicy evaluationimportance samplingloss re-weighting+6 more
Feb 1, 202610

Rethinking Selective Knowledge Distillation

Almog Tavor, Itay Ebenspanger, Neil Cnaan +1 more

Growing efforts to improve knowledge distillation (KD) in large language models (LLMs) replace dense teacher supervision with selective distillation, which uses a subset of token positions, vocabulary classes, or training samples for supervision. However, it remains unclear which importance signals,...

knowledge distillationlarge language modelsselective distillationautoregressive modelsstudent-entropy-guided position selection+2 more
Feb 1, 202612

Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

Shengrui Li, Fei Zhao, Kaiyan Zhao +6 more

Determining an effective data mixture is a key factor in Large Language Model (LLM) pre-training, where models must balance general competence with proficiency on hard tasks such as math and code. However, identifying an optimal mixture remains an open challenge, as existing approaches either rely o...

Large Language Modelpre-trainingdata mixturemodel mergingweighted model merging+4 more
Jan 31, 20268

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Chengyi Yang, Zhishang Xiang, Yunbo Tang +5 more

Test-Time Training offers a promising way to improve the reasoning ability of large language models (LLMs) by adapting the model using only the test questions. However, existing methods struggle with difficult reasoning problems for two reasons: raw test questions are often too difficult to yield hi...

test-time traininglarge language modelspseudo-labelsself-consistency rewardsquestion synthesizer+5 more
Jan 30, 202622

Position: Agentic Evolution is the Path to Evolving LLMs

Minhua Lin, Hanqing Lu, Zhan Shi +11 more

As Large Language Models (LLMs) move from curated training sets into open-ended real-world environments, a fundamental limitation emerges: static training cannot keep pace with continual deployment environment change. Scaling training-time and inference-time compute improves static capability but do...

Large Language Modelsdeployment-time adaptationparametric fine-tuningheuristic memory accumulationagentic evolution+2 more
Jan 30, 20265

BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

Jingwen Xu, Yiyang Lu, Zisu Huang +9 more

Training LLMs for code-related tasks typically depends on high-quality code-documentation pairs, which are costly to curate and often scarce for niche programming languages. We introduce BatCoder, a self-supervised reinforcement learning framework designed to jointly optimize code generation and doc...

self-supervised reinforcement learningback-translationcode generationdocumentation productionsemantic similarity+4 more
Jan 30, 20268

MemoryLLM: Plug-n-Play Interpretable Feed-Forward Memory for Transformers

Ajay Jaiswal, Lauren Hannah, Han-Byul Kim +4 more

Understanding how transformer components operate in LLMs is important, as it is at the core of recent technological advances in artificial intelligence. In this work, we revisit the challenges associated with interpretability of feed-forward modules (FFNs) and propose MemoryLLM, which aims to decoup...

transformer componentsfeed-forward modulesself-attentioninterpretabilityMemoryLLM+4 more
Jan 30, 20263

FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation

Siyang He, Qiqi Wang, Xiaoran Liu +8 more

Despite the non-autoregressive potential of diffusion language models (dLLMs), existing decoding strategies demonstrate positional bias, failing to fully unlock the potential of arbitrary generation. In this work, we delve into the inherent spectral characteristics of dLLMs and present the first fre...

diffusion language modelsspectral characteristicsfrequency-domain analysishidden stateslow-frequency components+5 more
Jan 30, 20269

Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification

Chuxue Cao, Jinluan Yang, Haoran Li +8 more

Large Language Models (LLMs) show remarkable capabilities, yet their stochastic next-token prediction creates logical inconsistencies and reward hacking that formal symbolic systems avoid. To bridge this gap, we introduce a formal logic verification-guided framework that dynamically interleaves form...

large language modelsformal logic verificationsymbolic verificationnatural language generationreasoning chain+5 more
Jan 30, 20265

Residual Context Diffusion Language Models

Yuezhou Hu, Harman Singh, Monishwaran Maheswaran +10 more

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to purely autoregressive language models because they can decode multiple tokens in parallel. However, state-of-the-art block-wise dLLMs rely on a "remasking" mechanism that decodes only the most confident tokens and dis...

diffusion large language modelsautoregressive language modelsremasking mechanismtoken representationscontextual residuals+6 more
Jan 30, 202623

Scaling Embeddings Outperforms Scaling Experts in Language Models

Hong Liu, Jiaqi Zhang, Chao Wang +13 more

While Mixture-of-Experts (MoE) architectures have become the standard for sparsity scaling in large language models, they increasingly face diminishing returns and system-level bottlenecks. In this work, we explore embedding scaling as a potent, orthogonal dimension for scaling sparsity. Through a c...

Mixture-of-Expertssparsity scalingembedding scalingPareto frontierparameter budgeting+5 more
Jan 29, 202690

Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities

Shuangshuang Ying, Zheyu Wang, Yunjian Peng +16 more

Despite strong performance on existing benchmarks, it remains unclear whether large language models can reason over genuinely novel scientific information. Most evaluations score end-to-end RAG pipelines, where reasoning is confounded with retrieval and toolchain choices, and the signal is further c...

large language modelsdocument-grounded reasoningretrieval-augmented generationdeep searchmulti-step synthesis+14 more
Jan 29, 202617

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Zihao Huang, Jundong Zhou, Xingwei Qu +2 more

Large language models allocate uniform computation across all tokens, ignoring that some sequences are trivially predictable while others require deep reasoning. We introduce ConceptMoE, which dynamically merges semantically similar tokens into concept representations, performing implicit token-leve...

ConceptMoEtoken-level compute allocationsemantically similar tokensconcept representationslearnable chunk module+7 more
Jan 29, 202631

Self-Improving Pretraining: using post-trained models to pretrain better models

Ellen Xiaoqing Tan, Shehzaad Dhuliawala, Jing Xu +4 more

Ensuring safety, factuality and overall quality in the generations of large language models is a critical challenge, especially as these models are increasingly deployed in real-world applications. The prevailing approach to addressing these issues involves collecting expensive, carefully curated da...

reinforcement learningpretrainingnext K generated tokensmodel rolloutssafety+2 more
Jan 29, 202610

Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening

Xiaotong Ji, Rasul Tutunov, Matthieu Zimmer +1 more

Reinforcement learning (RL) post-training is a dominant approach for improving the reasoning performance of large language models (LLMs), yet growing evidence suggests that its gains arise primarily from distribution sharpening rather than the acquisition of new capabilities. Recent work has shown t...

reinforcement learninglarge language modelsdistribution sharpeningMarkov chain Monte Carlopower distribution+4 more
Jan 29, 202611

Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units

Jianhui Chen, Yuzhang Luo, Liangming Pan

While Mechanistic Interpretability has identified interpretable circuits in LLMs, their causal origins in training data remain elusive. We introduce Mechanistic Data Attribution (MDA), a scalable framework that employs Influence Functions to trace interpretable units back to specific training sample...

Mechanistic InterpretabilityInfluence Functionsinterpretable unitstraining samplesPythia family+5 more
Jan 29, 20263

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale

Ajay Patel, Colin Raffel, Chris Callison-Burch

Due to limited supervised training data, large language models (LLMs) are typically pre-trained via a self-supervised "predict the next word" objective on a vast amount of unstructured text data. To make the resulting model useful to users, it is further trained on a far smaller amount of "instructi...

large language modelsself-supervised learningpredict the next wordinstruction-tuningsynthetic training data+5 more
Jan 29, 20265

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

Jiecong Wang, Hao Peng, Chunyang Liu

Chain-of-Thought (CoT) empowers Large Language Models (LLMs) to tackle complex problems, but remains constrained by the computational cost and reasoning path collapse when grounded in discrete token spaces. Recent latent reasoning approaches attempt to optimize efficiency by performing reasoning wit...

Chain-of-ThoughtLarge Language Modelslatent reasoningdiscrete token spacescontinuous hidden states+3 more
Jan 29, 20264
PreviousPage 6 of 10Next
Latest Large Language Models Research | Large Language Models Papers