Latest Large Language Models Research Papers

Research on large language models including GPT, Claude, Llama, and other transformer-based architectures for natural language understanding and generation.

189 Papers

Showing 20 of 20 papers

LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation

Ahmadreza Jeddi, Marco Ciccone, Babak Taati

Looped Transformers have emerged as an efficient and powerful class of models for reasoning in the language domain. Recent studies show that these models achieve strong performance on algorithmic and reasoning tasks, suggesting that looped architectures possess an inductive bias toward latent reason...

looped Transformersreasoninginductive biasloop iterationsvariable compute budgets+5 more

Feb 11, 202614

When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Leheng Sheng, Yongtao Zhang, Wenchang Ma +6 more

While reasoning over long context is crucial for various real-world applications, it remains challenging for large language models (LLMs) as they suffer from performance degradation as the context length grows. Recent work MemAgent has tried to tackle this by processing context chunk-by-chunk in an ...

long-context reasoninglarge language modelsrecurrent memory updatetext-controlled gatesreward signals+4 more

Feb 11, 202621

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

Dawid J. Kopiczko, Sagar Vaze, Tijmen Blankevoort +1 more

Supervised fine-tuning (SFT) on chain-of-thought data is an essential post-training step for reasoning language models. Standard machine learning intuition suggests that training with more unique training samples yields better generalization. Counterintuitively, we show that SFT benefits from repeti...

supervised fine-tuningchain-of-thought datareasoning language modelstraining epochstoken accuracy+5 more

Feb 11, 20268

QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search

Jianzhao Huang, Xiaorui Huang, Fei Zhao +9 more

Query Processing (QP) bridges user intent and content supply in large-scale Social Network Service (SNS) search engines. Traditional QP systems rely on pipelines of isolated discriminative models (e.g., BERT), suffering from limited semantic understanding and high maintenance overhead. While Large L...

Large Language Modelsquery processingdiscriminative modelssequence generationprogressive three-stage alignment+6 more

Feb 10, 20266

LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation

Zhiling Yan, Dingjie Song, Zhe Fang +4 more

The deployment of Large Language Models (LLMs) in high-stakes clinical settings demands rigorous and reliable evaluation. However, existing medical benchmarks remain static, suffering from two critical limitations: (1) data contamination, where test sets inadvertently leak into training corpora, lea...

Large Language Modelsmedical benchmarksdata contaminationtemporal misalignmentclinical reasoning+2 more

Feb 10, 202612

Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens

Weihao Liu, Dehai Min, Lu Cheng

While explicit Chain-of-Thought (CoT) equips Large Language Models (LLMs) with strong reasoning capabilities, it requires models to verbalize every intermediate step in text tokens, constraining the model thoughts to the discrete vocabulary space. Recently, reasoning in continuous latent space has e...

Chain-of-ThoughtLarge Language Modelslatent spacehidden statesvocabulary embedding space+4 more

Feb 10, 20265

Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models

Shiting Huang, Zecheng Li, Yu Zeng +7 more

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective approach for enhancing the reasoning capabilities of Large Language Models (LLMs). Despite its efficacy, RLVR faces a meta-learning bottleneck: it lacks mechanisms for error attribution and experience internalization i...

reinforcement learninglarge language modelsmeta-experienceself-verificationcontrastive analysis+4 more

Feb 10, 20269

iGRPO: Self-Feedback-Driven LLM Reasoning

Ali Hatamizadeh, Shrimai Prabhumoye, Igor Gitman +5 more

Large Language Models (LLMs) have shown promise in solving complex mathematical problems, yet they still fall short of producing accurate and consistent solutions. Reinforcement Learning (RL) is a framework for aligning these models with task-specific rewards, improving overall quality and reliabili...

Reinforcement LearningProximal Policy OptimizationGroup Relative Policy Optimizationiterative policy optimizationmathematical reasoning+4 more

Feb 9, 202613

OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration

Qi Guo, Jianing Wang, Deyang Kong +7 more

Parallel thinking has emerged as a new paradigm for large reasoning models (LRMs) in tackling complex problems. Recent methods leverage Reinforcement Learning (RL) to enhance parallel thinking, aiming to address the limitations in computational resources and effectiveness encountered with supervised...

parallel thinkinglarge reasoning modelsReinforcement LearningRLVRmutual information bottleneck+5 more

Feb 9, 20264

LLaDA2.1: Speeding Up Text Diffusion via Token Editing

Tiwei Bie, Maosong Cao, Xiang Cao +47 more

While LLaDA2.0 showcased the scaling potential of 100B-level block-diffusion models and their inherent parallelization, the delicate equilibrium between decoding speed and generation quality has remained an elusive frontier. Today, we unveil LLaDA2.1, a paradigm shift designed to transcend this trad...

block-diffusion modelsdecoding speedgeneration qualityToken-to-Token editingMask-to-Token scheme+11 more

Feb 9, 202635

Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models

Mingzi Cao, Xingwei Tan, Mahmud Akhter +4 more

Deduction, induction, and abduction are fundamental reasoning paradigms, core for human logical thinking. Although improving Large Language Model (LLM) reasoning has attracted significant research efforts, the extent to which the fundamental paradigms induce generalization has yet to be systematical...

Large Language Modelreasoning paradigmsfine-tuningmixture-of-expertsout-of-domain tasks+1 more

Feb 9, 202610

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Zehao Chen, Gongxun Li, Tianxiang Ai +8 more

As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative s...

post-training optimizationlarge language modelssaturation bottleneckweak checkpointsentropy dynamics+2 more

Feb 9, 202630

How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs

Yapei Chang, Kyle Lo, Mohit Iyyer +1 more

Generating step-by-step "how-to" procedures is a key LLM capability: how-to advice is commonly requested in chatbots, and step-by-step planning is critical for reasoning over complex tasks. Yet, measuring and improving procedural validity at scale on real-world tasks remains challenging and understu...

goal-conditioned procedure generationHow2MineHow2BenchHow2ScoreLLM judge+4 more

Feb 9, 20265

Improving Data and Reward Design for Scientific Reasoning in Large Language Models

Zijie Chen, Zhenghao Lin, Xiao Liu +3 more

Solving open-ended science questions remains challenging for large language models, particularly due to inherently unreliable supervision and evaluation. The bottleneck lies in the data construction and reward design for scientific post-training. We develop a large-scale, systematic data processing ...

large language modelsopen-ended science questionsdata constructionreward designDr. SCI dataset+8 more

Feb 9, 202635

Prism: Spectral-Aware Block-Sparse Attention

Xinghao Wang, Pengyu Wang, Xiaoran Liu +4 more

Block-sparse attention is promising for accelerating long-context LLM pre-filling, yet identifying relevant blocks efficiently remains a bottleneck. Existing methods typically employ coarse-grained attention as a proxy for block importance estimation, but often resort to expensive token-level search...

block-sparse attentionlong-context LLMpre-fillingcoarse-grained attentionRotary Positional Embeddings+8 more

Feb 9, 202630

Effective Reasoning Chains Reduce Intrinsic Dimensionality

Archiki Prasad, Mandar Joshi, Kenton Lee +2 more

Chain-of-thought (CoT) reasoning and its variants have substantially improved the performance of language models on complex reasoning tasks, yet the precise mechanisms by which different strategies facilitate generalization remain poorly understood. While current explanations often point to increase...

chain-of-thoughtlanguage modelsreasoning strategiesintrinsic dimensionalitygeneralization+4 more

Feb 9, 20267

Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning

Zhuoen Chen, Dongfang Li, Meishan Zhang +2 more

Large Language Models (LLMs) face significant challenges in long-context processing, including quadratic computational costs, information forgetting, and the context fragmentation inherent in retrieval-augmented generation (RAG). We propose a cognitively inspired framework for efficient long-context...

large language modelslong-context processingretrieval-augmented generationchunk-wise compressionselective memory recall+8 more

Feb 9, 20268

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Zixuan Huang, Xin Xia, Yuxi Ren +11 more

Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-ti...

large reasoning modelschains of thoughtsampling paradigmsself-aware guided efficient reasoninggroup-based reinforcement learning+2 more

Feb 9, 2026170

Beyond Correctness: Learning Robust Reasoning via Transfer

Hyunseok Lee, Soheil Abbasloo, Jihoon Tack +1 more

Reinforcement Learning with Verifiable Rewards (RLVR) has recently strengthened LLM reasoning, but its focus on final answer correctness leaves a critical gap: it does not ensure the robustness of the reasoning process itself. We adopt a simple philosophical view, robust reasoning should remain usef...

Reinforcement Learning with Verifiable RewardsReinforcement Learning with Transferable RewardLLM reasoningtransfer rewardreasoning robustness+5 more