Latest Large Language Models Research Papers

Research on large language models including GPT, Claude, Llama, and other transformer-based architectures for natural language understanding and generation.

49 Papers
Showing 20 of 20 papers

Towards Automated Kernel Generation in the Era of LLMs

Yang Yu, Peiyu Zang, Chi Hsu Tsai +11 more

The performance of modern AI systems is fundamentally constrained by the quality of their underlying kernels, which translate high-level algorithmic semantics into low-level hardware operations. Achieving near-optimal kernels requires expert-level understanding of hardware architectures and programm...

large language modelsLLM-based agentskernel generationkernel optimizationagentic systems+5 more
Jan 22, 202615

Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Chenghao Fan, Wen Heng, Bo Li +6 more

Diffusion-based language models (DLLMs) offer non-sequential, block-wise generation and richer data reuse compared to autoregressive (AR) models, but existing code DLLMs still lag behind strong AR baselines under comparable budgets. We revisit this setting in a controlled study and introduce Stable-...

diffusion-based language modelsautoregressive modelsblock diffusioncontinual pretrainingwarmup+5 more
Jan 22, 202646

From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models

Jiaxin Zhang, Wendi Cui, Zhuohang Li +4 more

While Large Language Models (LLMs) show remarkable capabilities, their unreliability remains a critical barrier to deployment in high-stakes domains. This survey charts a functional evolution in addressing this challenge: the evolution of uncertainty from a passive diagnostic metric to an active con...

Large Language Modelsuncertaintyactive control signaladvanced reasoningautonomous agents+6 more
Jan 22, 20264

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Zanlin Ni, Shenzhi Wang, Yang Yue +8 more

Diffusion Large Language Models (dLLMs) break the rigid left-to-right constraint of traditional LLMs, enabling token generation in arbitrary orders. Intuitively, this flexibility implies a solution space that strictly supersets the fixed autoregressive trajectory, theoretically unlocking superior re...

diffusion large language modelsleft-to-right constrainttoken generationreinforcement learningreasoning potential+7 more
Jan 21, 202659

Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning

Zhihang Yuan, Chengyu Yue, Long Huang +2 more

Instruction tuning is a standard paradigm for adapting large language models (LLMs), but modern instruction datasets are large, noisy, and redundant, making full-data fine-tuning costly and often unnecessary. Existing data selection methods either build expensive gradient datastores or assign static...

instruction tuninglarge language modelsdata selectiongradient datastoreLoRA ensemble+5 more
Jan 20, 20263

InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning

Matthew Y. R. Yang, Hao Bai, Ian Wu +3 more

Outcome-reward reinforcement learning (RL) has proven effective at improving the reasoning capabilities of large language models (LLMs). However, standard RL assigns credit only at the level of the final answer, penalizing entire reasoning traces when the outcome is incorrect and uniformly reinforci...

reinforcement learningcredit assignmentprocess reward modelsupervised fine-tuningintervention training+3 more
Jan 20, 20264

DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution

Shengda Fan, Xuyan Ye, Yankai Lin

Self-play with large language models has emerged as a promising paradigm for achieving self-improving artificial intelligence. However, existing self-play frameworks often suffer from optimization instability, due to (i) non-stationary objectives induced by solver-dependent reward feedback for the Q...

self-playlarge language modelsoptimization instabilitynon-stationary objectivesbootstrapping errors+10 more
Jan 20, 202614

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

Hengyuan Zhang, Zhihao Zhang, Mingyang Wang +25 more

Mechanistic Interpretability (MI) has emerged as a vital approach to demystify the opaque decision-making of Large Language Models (LLMs). However, existing reviews primarily treat MI as an observational science, summarizing analytical insights while lacking a systematic framework for actionable int...

Mechanistic InterpretabilityLarge Language ModelsLocalizingSteeringInterpretable Objects+3 more
Jan 20, 202642

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

Hyunjong Ok, Jaeho Lee

Large language models exhibit surprising sensitivity to the structure of the prompt, but the mechanisms underlying this sensitivity remain poorly understood. In this work, we conduct an in-depth investigation on a striking case: in multiple-choice question answering, placing context before the quest...

large language modelsprompt structuremultiple-choice question answeringcausal attentioncausal mask+1 more
Jan 20, 20264

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

Yuming Yang, Mingyoung Lai, Wanxu Zhao +13 more

Long chain-of-thought (CoT) trajectories provide rich supervision signals for distilling reasoning from teacher to student LLMs. However, both prior work and our experiments show that trajectories from stronger teachers do not necessarily yield better students, highlighting the importance of data-st...

reasoning trajectoriesdistillationteacher-student LLMsstudent likelihoodtoken-wise rank+2 more
Jan 20, 20266

A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification

Gonzalo Ariel Meyoyan, Luciano Del Corro

Production LLM systems often rely on separate models for safety and other classification-heavy steps, increasing latency, VRAM footprint, and operational complexity. We instead reuse computation already paid for by the serving LLM: we train lightweight probes on its hidden states and predict labels ...

hidden statestoken-layer hidden-state tensorrepresentation selectiontwo-stage aggregatorpooling+4 more
Jan 19, 202611

MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models

Zecheng Tang, Baibei Ji, Ruoxi Sun +7 more

Existing works increasingly adopt memory-centric mechanisms to process long contexts in a segment manner, and effective memory management is one of the key capabilities that enables large language models to effectively propagate information across the entire sequence. Therefore, leveraging reward mo...

memory-centric mechanismslong-context comprehensionlong-form generationreward modelsMemoryRewardBench+2 more
Jan 17, 202626

Language of Thought Shapes Output Diversity in Large Language Models

Shaoyang Xu, Wenxuan Zhang

Output diversity is crucial for Large Language Models as it underpins pluralism and creativity. In this work, we reveal that controlling the language used during model thinking-the language of thought-provides a novel and structural source of output diversity. Our preliminary study shows that differ...

language of thoughtthinking spacemultilingual thinkingSingle-Language SamplingMixed-Language Sampling+3 more
Jan 16, 20263

NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Jiayu Liu, Rui Wang, Qing Zong +7 more

Accurately assessing model confidence is essential for deploying large language models (LLMs) in mission-critical factual domains. While retrieval-augmented generation (RAG) is widely adopted to improve grounding, confidence calibration in RAG settings remains poorly understood. We conduct a systema...

retrieval-augmented generationconfidence calibrationnoise-aware calibrationsupervised fine-tuningECE scores
Jan 16, 202625

When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs

Zhongxiang Sun, Yi Zhan, Chenglei Shen +4 more

Personalized large language models (LLMs) adapt model behavior to individual users to enhance user satisfaction, yet personalization can inadvertently distort factual reasoning. We show that when personalized LLMs face factual queries, there exists a phenomenon where the model generates answers alig...

personalized large language modelsfactual reasoningpersonalization-induced hallucinationsrepresentational entanglementFactuality-Preserving Personalized Steering+4 more
Jan 16, 202624

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

Lecheng Yan, Ruizhe Li, Guanhua Chen +5 more

Reinforcement Learning with Verifiable Rewards (RLVR) is highly effective for enhancing LLM reasoning, yet recent evidence shows models like Qwen 2.5 achieve significant gains even with spurious or incorrect rewards. We investigate this phenomenon and identify a "Perplexity Paradox": spurious RLVR t...

Reinforcement Learning with Verifiable Rewardsperplexityanswer-token perplexityprompt-side coherencePath Patching+8 more
Jan 16, 20266

Reasoning Models Generate Societies of Thought

Junsol Kim, Shiyang Lai, Nino Scherrer +2 more

Large language models have achieved remarkable capabilities across domains, yet mechanisms underlying sophisticated reasoning remain elusive. Recent reasoning models outperform comparable instruction-tuned models on complex cognitive tasks, attributed to extended computation through longer chains of...

large language modelsreasoning modelschains of thoughtmulti-agent-like interactionsperspective diversity+8 more
Jan 15, 20267

PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution

Minghao Yan, Bo Peng, Benjamin Coleman +11 more

Large Language Models (LLMs) have emerged as powerful operators for evolutionary search, yet the design of efficient search scaffolds remains ad hoc. While promising, current LLM-in-the-loop systems lack a systematic approach to managing the evolutionary process. We identify three distinct failure m...

evolutionary searchlarge language modelscontext pollutionmode collapseweak collaboration+5 more
Jan 15, 202618

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Christina Lu, Jack Gallagher, Jonathan Michala +2 more

Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We investigate the structure of the space of model personas by extracting activation directions corresponding to diverse character archetypes. Across sever...

persona spaceactivation directionsAssistant Axispersona driftmeta-reflection+1 more
Jan 15, 20269
Page 1 of 3Next