Latest Large Language Models Research Papers

Research on large language models including GPT, Claude, Llama, and other transformer-based architectures for natural language understanding and generation.

192 Papers

Showing 20 of 20 papers

Reasoning Models Generate Societies of Thought

Junsol Kim, Shiyang Lai, Nino Scherrer +2 more

Large language models have achieved remarkable capabilities across domains, yet mechanisms underlying sophisticated reasoning remain elusive. Recent reasoning models outperform comparable instruction-tuned models on complex cognitive tasks, attributed to extended computation through longer chains of...

large language modelsreasoning modelschains of thoughtmulti-agent-like interactionsperspective diversity+8 more

Jan 15, 20267

PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution

Minghao Yan, Bo Peng, Benjamin Coleman +11 more

Large Language Models (LLMs) have emerged as powerful operators for evolutionary search, yet the design of efficient search scaffolds remains ad hoc. While promising, current LLM-in-the-loop systems lack a systematic approach to managing the evolutionary process. We identify three distinct failure m...

evolutionary searchlarge language modelscontext pollutionmode collapseweak collaboration+5 more

Jan 15, 202618

The AI Hippocampus: How Far are We From Human Memory?

Zixia Jia, Jiaqi Li, Yipeng Kang +12 more

Memory plays a foundational role in augmenting the reasoning, adaptability, and contextual fidelity of modern Large Language Models and Multi-Modal LLMs. As these models transition from static predictors to interactive systems capable of continual learning and personalized inference, the incorporati...

Jan 14, 20261

Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Shaotian Yan, Kaiyuan Liu, Chen Shen +6 more

In this report, we introduce DASD-4B-Thinking, a lightweight yet highly capable, fully open-source reasoning model. It achieves SOTA performance among open-source models of comparable scale across challenging benchmarks in mathematics, scientific reasoning, and code generation -- even outperforming ...

Jan 14, 202653

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Tao Liu, Taiqiang Wu, Runming Yang +3 more

Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core ex...

supervised fine-tuningLarge Language Modelsone-to-many naturetoken probabilitysemantic importance+2 more

Jan 14, 202613

EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge

Shijian Ma, Yan Lin, Yi Yang

Detecting evasive answers in earnings calls is critical for financial transparency, yet progress is hindered by the lack of large-scale benchmarks. We introduce EvasionBench, comprising 30,000 training samples and 1,000 human-annotated test samples (Cohen's Kappa 0.835) across three evasion levels. ...

Jan 14, 20268

OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG

Fengran Mo, Zhan Su, Yuchen Hui +6 more

The development of large language models (LLMs) has achieved superior performance in a range of downstream tasks, including LLM-based retrieval-augmented generation (RAG). The quality of generated content heavily relies on the usefulness of the retrieved information and the capacity of LLMs' interna...

Jan 13, 202630

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

Yao Tang, Li Dong, Yaru Hao +3 more

Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT), but at the cost of long, low-bandwidth token sequences. Humans, by contrast, often reason softly by maintaining a distribution over plausible next steps. Motivated by this, we propose Multiplex Th...

Chain-of-Thoughtstochastic soft reasoningmultiplex tokencontinuous multiplex tokenon-policy reinforcement learning+3 more

Jan 13, 202633

Demystifying the Slash Pattern in Attention: The Role of RoPE

Yuan Cheng, Fengzhuo Zhang, Yunlong Hou +5 more

Large Language Models (LLMs) often exhibit slash attention patterns, where attention scores concentrate along the Δ-th sub-diagonal for some offset Δ. These patterns play a key role in passing information across tokens. But why do they emerge? In this paper, we demystify the emergence of these Slash...

Jan 13, 20261

sui-1: Grounded and Verifiable Long-Form Summarization

Benedikt Droste, Jan Philipp Harries, Maximilian Idahl +1 more

Large language models frequently generate plausible but unfaithful summaries that users cannot verify against source text, a critical limitation in compliance-sensitive domains such as government and legal analysis. We present sui-1, a 24B parameter model that produces abstractive summaries with inl...

Jan 13, 20261

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Zhiyuan Hu, Yucheng Wang, Yufei He +7 more

Reinforcement learning (RL) has become a central paradigm for post-training large language models (LLMs), particularly for complex reasoning tasks, yet it often suffers from exploration collapse: policies prematurely concentrate on a small set of dominant reasoning patterns, improving pass@1 while l...

Jan 13, 2026129

Ministral 3

Alexander H. Liu, Kartik Khandelwal, Sandeep Subramanian +117 more

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose...

Jan 13, 202631

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Jiangshan Duo, Hanyu Li, Hailin Zhang +3 more

Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard paradigm for reasoning in Large Language Models. However, optimizing solely for final-answer correctness often drives models into aimless, verbose exploration, where they rely on exhaustive trial-and-error tactics rather tha...

Jan 13, 20266

Parallel Context-of-Experts Decoding for Retrieval Augmented Generation

Giulio Corallo, Paolo Papotti

Retrieval Augmented Generation faces a trade-off: concatenating documents in a long prompt enables multi-document reasoning but creates prefill bottlenecks, while encoding document KV caches separately offers speed but breaks cross-document interaction. We propose Parallel Context-of-Experts Decodin...

Jan 13, 202618

Entropy Sentinel: Continuous LLM Accuracy Monitoring from Decoding Entropy Traces in STEM

Pedro Memoli Buffa, Luciano Del Corro

Deploying LLMs raises two coupled challenges: (1) monitoring - estimating where a model underperforms as traffic and domains drift - and (2) improvement - prioritizing data acquisition to close the largest performance gaps. We test whether an inference-time signal can estimate slice-level accuracy u...

output-entropy profilenext-token probabilitiesfinal-layertop-k logprobsinstance correctness+4 more

Jan 13, 202615

RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation

Sunzhu Li, Jiale Zhao, Miteto Wei +6 more

Reinforcement Learning with Verifiable Rewards (RLVR) has driven substantial progress in reasoning-intensive domains like mathematics. However, optimizing open-ended generation remains challenging due to the lack of ground truth. While rubric-based evaluation offers a structured proxy for verificati...

Reinforcement Learning with Verifiable Rewardsrubric-based evaluationprinciple-guided synthesismulti-model aggregationdifficulty evolution+3 more

Jan 13, 202650

YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation

Abdelaziz Bounhar, Rania Hossam Elmohamady Elbadry, Hadi Abdine +3 more

Steering Large Language Models (LLMs) through activation interventions has emerged as a lightweight alternative to fine-tuning for alignment and personalization. Recent work on Bi-directional Preference Optimization (BiPO) shows that dense steering vectors can be learned directly from preference dat...

Large Language Modelsactivation interventionsfine-tuningBi-directional Preference OptimizationDirect Preference Optimization+13 more

Jan 13, 20266

Enhancing Sentiment Classification and Irony Detection in Large Language Models through Advanced Prompt Engineering Techniques

Marvin Schmitt, Anne Schwerk, Sebastian Lempert

This study investigates the use of prompt engineering to enhance large language models (LLMs), specifically GPT-4o-mini and gemini-1.5-flash, in sentiment analysis tasks. It evaluates advanced prompting techniques like few-shot learning, chain-of-thought prompting, and self-consistency against a bas...

prompt engineeringlarge language modelsfew-shot learningchain-of-thought promptingself-consistency+3 more

Jan 13, 20264

On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training

Xueyan Niu, Bo Bai, Wei Han +1 more

Post-training of large language models routinely interleaves supervised fine-tuning (SFT) with reinforcement learning (RL). These two methods have different objectives: SFT minimizes the cross-entropy loss between model outputs and expert responses, while RL maximizes reward signals derived from hum...

Jan 12, 20261

MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

Zizhen Li, Chuanhao Li, Yibin Wang +7 more

Recent advancements have expanded the role of Large Language Models in board games from playing agents to creative co-designers. However, a critical gap remains: current systems lack the capacity to offer constructive critique grounded in the emergent user experience. Bridging this gap is fundamenta...

Large Language Modelsboard gamesHuman-AI collaborationcreative co-designersconstructive critique+6 more

Jan 12, 20268

PreviousPage 9 of 10Next

View all categories

Latest Large Language Models Research | Large Language Models Papers