Latest Large Language Models Research Papers

Research on large language models including GPT, Claude, Llama, and other transformer-based architectures for natural language understanding and generation.

192 Papers
Showing 20 of 20 papers

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Johannes Kirmayr, Lukas Stappen, Elisabeth André

Existing benchmarks for Large Language Model (LLM) agents focus on task completion under idealistic settings but overlook reliability in real-world, user-facing applications. In domains, such as in-car voice assistants, users often issue incomplete or ambiguous requests, creating intrinsic uncertain...

Large Language Model agentsmulti-turn dialoguetool-using agentsin-car assistanthallucination tasks+4 more
Jan 29, 202677

Linear representations in language models can change dramatically over a conversation

Andrew Kyle Lampinen, Yuxuan Li, Eghbal Hosseini +2 more

Language model representations often contain linear directions that correspond to high-level concepts. Here, we study the dynamics of these representations: how representations evolve along these dimensions within the context of (simulated) conversations. We find that linear representations can chan...

linear directionslanguage model representationsconversation dynamicsfactual representationrepresentational drift+3 more
Jan 28, 20265

Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives

Tengyue Xu, Zhuoyang Qian, Gaoge Liu +16 more

Autonomous scientific discovery with large language model (LLM)-based agents has recently made substantial progress, demonstrating the ability to automate end-to-end research workflows. However, existing systems largely rely on runtime-centric execution paradigms, repeatedly reading, summarizing, an...

large language modelautonomous scientific discoveryruntime-centric executioncontext window limitationshallucination+5 more
Jan 28, 2026143

Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning

Minwu Kim, Safal Shrestha, Keith Ross

Reinforcement Learning with Verifiable Rewards (RLVR) has substantially improved the reasoning abilities of large language models (LLMs), yet training often stalls as problems become saturated. We identify the core challenge as the poor accessibility of informative failures: learning signals exist b...

reinforcement learninglarge language modelsreasoning abilitiestraining stagnationinformative failures+6 more
Jan 28, 20264

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

Zhuoran Yang, Ed Li, Jianliang He +18 more

We present Foundation-Sec-8B-Reasoning, the first open-source native reasoning model for cybersecurity. Built upon our previously released Foundation-Sec-8B base model (derived from Llama-3.1-8B-Base), the model is trained through a two-stage process combining supervised fine-tuning (SFT) and reinfo...

supervised fine-tuningreinforcement learning from verifiable rewardscybersecurity analysismulti-hop reasoningsafety performance
Jan 28, 20269

Persona Prompting as a Lens on LLM Social Reasoning

Jing Yang, Moritz Hechtbauer, Elisabeth Khalilov +3 more

For socially sensitive tasks like hate speech detection, the quality of explanations from Large Language Models (LLMs) is crucial for factors like user trust and model alignment. While Persona prompting (PP) is increasingly used as a way to steer model towards user-specific generation, its effect on...

Large Language ModelsPersona promptinghate speech detectionword-level rationalesdemographic personas+4 more
Jan 28, 20263

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning

Kishan Panaganti, Zhenwen Liang, Wenhao Yu +2 more

Recent progress in Large Language Model (LLM) reasoning is increasingly driven by the refinement of post-training loss functions and alignment strategies. However, standard Reinforcement Learning (RL) paradigms like Group Relative Policy Optimization (GRPO) remain constrained by static uniformity: u...

Large Language Modelpost-trainingReinforcement LearningGRPOGroup Distributionally Robust Optimization+7 more
Jan 27, 20267

Revisiting Parameter Server in LLM Post-Training

Xinyi Wan, Penghui Qi, Guangxing Huang +3 more

Modern data parallel (DP) training favors collective communication over parameter servers (PS) for its simplicity and efficiency under balanced workloads. However, the balanced workload assumption no longer holds in large language model (LLM) post-training due to the high variance in sequence length...

data parallelparameter serverscollective communicationFully Sharded Data ParallelOn-Demand Communication+6 more
Jan 27, 20265

VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

Vikash Singh, Darion Cassel, Nathaniel Weir +2 more

Despite the syntactic fluency of Large Language Models (LLMs), ensuring their logical correctness in high-stakes domains remains a fundamental challenge. We present a neurosymbolic framework that combines LLMs with SMT solvers to produce verification-guided answers through iterative refinement. Our ...

Large Language ModelsSMT solversverification-guided answersiterative refinementatomic claims+13 more
Jan 27, 20265

AACR-Bench: Evaluating Automatic Code Review with Holistic Repository-Level Context

Lei Zhang, Yongda Yu, Minghui Yu +11 more

High-quality evaluation benchmarks are pivotal for deploying Large Language Models (LLMs) in Automated Code Review (ACR). However, existing benchmarks suffer from two critical limitations: first, the lack of multi-language support in repository-level contexts, which restricts the generalizability of...

Large Language ModelsAutomated Code Reviewevaluation benchmarkscross-file contextAI-assisted annotation+6 more
Jan 27, 202614

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Shobhita Sundaram, John Quan, Ariel Kwiatkowski +3 more

Can a model learn to escape its own learning plateau? Reinforcement learning methods for finetuning large reasoning models stall on datasets with low initial success rates, and thus little training signal. We investigate a fundamental question: Can a pretrained LLM leverage latent knowledge to gener...

pretrained LLMreinforcement learningfinetuningmeta-RLautomated curriculum+8 more
Jan 26, 202625

Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models

Kunat Pipatanakul, Pittawat Taveekitworachai

Large language models (LLMs) have progressed rapidly; however, most state-of-the-art models are trained and evaluated primarily in high-resource languages such as English and Chinese, and are often developed by a small number of organizations with access to large-scale compute and data. This gatekee...

supervised fine-tuningon-policy distillationreinforcement fine-tuningGRPOInK-GRPO+3 more
Jan 26, 20269

TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance

Elena Bruches, Vadim Alperovich, Dari Baturova +8 more

While Large Language Models (LLMs) have shown promise in software engineering, their application to unit testing remains largely confined to isolated test generation or oracle prediction, neglecting the broader challenge of test suite maintenance. We introduce TAM-Eval (Test Automated Maintenance Ev...

Large Language Modelsunit testingtest suite maintenancetest automationtest generation+8 more
Jan 26, 20263

FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning

Lin Sun, Linglin Zhang, Jingang Huang +3 more

The rapid expansion of long-context Large Language Models (LLMs) has reignited debate on whether Retrieval-Augmented Generation (RAG) remains necessary. However, empirical evidence reveals persistent limitations of long-context inference, including the lost-in-the-middle phenomenon, high computation...

Retrieval-Augmented Generationlong-context Large Language Modelshierarchical forest indexesbi-path strategyLLM-guided hierarchical traversal+3 more
Jan 26, 202611

Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction

Jang-Hyun Kim, Dongyoon Han, Sangdoo Yun

Efficient key-value (KV) cache management is crucial for the practical deployment of large language models (LLMs), yet existing compression techniques often incur a trade-off between performance degradation and computational overhead. We propose a novel gating-based KV cache eviction method for froz...

key-value cacheKV cache evictionfrozen-weight LLMssink-attention gating modulesforward passes+4 more
Jan 25, 20263

CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval

Tsung-Hsiang Chou, Chen-Jui Yu, Shui-Hsiang Hsu +1 more

General-purpose embedding models have demonstrated strong performance in text retrieval but remain suboptimal for table retrieval, where highly structured content leads to semantic compression and query-table mismatch. Recent LLM-based retrieval augmentation methods mitigate this issue by generating...

embedding modelstable retrievalLLM-based retrieval augmentationsynthetic querieshard-negative contrastive fine-tuning+4 more
Jan 22, 202612

Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain

Özgür Uğur, Mahmut Göksu, Mahmut Çimen +6 more

This paper presents Mecellem models, a framework for developing specialized language models for the Turkish legal domain through domain adaptation strategies. We make two contributions: (1)Encoder Model Pre-trained from Scratch: ModernBERT-based bidirectional encoders pre-trained on a Turkish-domina...

domain adaptationpre-trained modelsencoder modelsdecoder modelscontinual pre-training+17 more
Jan 22, 20267

From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models

Jiaxin Zhang, Wendi Cui, Zhuohang Li +4 more

While Large Language Models (LLMs) show remarkable capabilities, their unreliability remains a critical barrier to deployment in high-stakes domains. This survey charts a functional evolution in addressing this challenge: the evolution of uncertainty from a passive diagnostic metric to an active con...

Large Language Modelsuncertaintyactive control signaladvanced reasoningautonomous agents+6 more
Jan 22, 20264
PreviousPage 7 of 10Next
Latest Large Language Models Research | Large Language Models Papers