Latest Large Language Models Research Papers

Research on large language models including GPT, Claude, Llama, and other transformer-based architectures for natural language understanding and generation.

192 Papers
Showing 20 of 20 papers

Reasoning Models Generate Societies of Thought

Junsol Kim, Shiyang Lai, Nino Scherrer +2 more

Large language models have achieved remarkable capabilities across domains, yet mechanisms underlying sophisticated reasoning remain elusive. Recent reasoning models outperform comparable instruction-tuned models on complex cognitive tasks, attributed to extended computation through longer chains of...

large language modelsreasoning modelschains of thoughtmulti-agent-like interactionsperspective diversity+8 more
Jan 15, 20267

PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution

Minghao Yan, Bo Peng, Benjamin Coleman +11 more

Large Language Models (LLMs) have emerged as powerful operators for evolutionary search, yet the design of efficient search scaffolds remains ad hoc. While promising, current LLM-in-the-loop systems lack a systematic approach to managing the evolutionary process. We identify three distinct failure m...

evolutionary searchlarge language modelscontext pollutionmode collapseweak collaboration+5 more
Jan 15, 202618

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Tao Liu, Taiqiang Wu, Runming Yang +3 more

Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core ex...

supervised fine-tuningLarge Language Modelsone-to-many naturetoken probabilitysemantic importance+2 more
Jan 14, 202613

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

Yao Tang, Li Dong, Yaru Hao +3 more

Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT), but at the cost of long, low-bandwidth token sequences. Humans, by contrast, often reason softly by maintaining a distribution over plausible next steps. Motivated by this, we propose Multiplex Th...

Chain-of-Thoughtstochastic soft reasoningmultiplex tokencontinuous multiplex tokenon-policy reinforcement learning+3 more
Jan 13, 202633

Entropy Sentinel: Continuous LLM Accuracy Monitoring from Decoding Entropy Traces in STEM

Pedro Memoli Buffa, Luciano Del Corro

Deploying LLMs raises two coupled challenges: (1) monitoring - estimating where a model underperforms as traffic and domains drift - and (2) improvement - prioritizing data acquisition to close the largest performance gaps. We test whether an inference-time signal can estimate slice-level accuracy u...

output-entropy profilenext-token probabilitiesfinal-layertop-k logprobsinstance correctness+4 more
Jan 13, 202615

RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation

Sunzhu Li, Jiale Zhao, Miteto Wei +6 more

Reinforcement Learning with Verifiable Rewards (RLVR) has driven substantial progress in reasoning-intensive domains like mathematics. However, optimizing open-ended generation remains challenging due to the lack of ground truth. While rubric-based evaluation offers a structured proxy for verificati...

Reinforcement Learning with Verifiable Rewardsrubric-based evaluationprinciple-guided synthesismulti-model aggregationdifficulty evolution+3 more
Jan 13, 202650

YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation

Abdelaziz Bounhar, Rania Hossam Elmohamady Elbadry, Hadi Abdine +3 more

Steering Large Language Models (LLMs) through activation interventions has emerged as a lightweight alternative to fine-tuning for alignment and personalization. Recent work on Bi-directional Preference Optimization (BiPO) shows that dense steering vectors can be learned directly from preference dat...

Large Language Modelsactivation interventionsfine-tuningBi-directional Preference OptimizationDirect Preference Optimization+13 more
Jan 13, 20266

Enhancing Sentiment Classification and Irony Detection in Large Language Models through Advanced Prompt Engineering Techniques

Marvin Schmitt, Anne Schwerk, Sebastian Lempert

This study investigates the use of prompt engineering to enhance large language models (LLMs), specifically GPT-4o-mini and gemini-1.5-flash, in sentiment analysis tasks. It evaluates advanced prompting techniques like few-shot learning, chain-of-thought prompting, and self-consistency against a bas...

prompt engineeringlarge language modelsfew-shot learningchain-of-thought promptingself-consistency+3 more
Jan 13, 20264

MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

Zizhen Li, Chuanhao Li, Yibin Wang +7 more

Recent advancements have expanded the role of Large Language Models in board games from playing agents to creative co-designers. However, a critical gap remains: current systems lack the capacity to offer constructive critique grounded in the emergent user experience. Bridging this gap is fundamenta...

Large Language Modelsboard gamesHuman-AI collaborationcreative co-designersconstructive critique+6 more
Jan 12, 20268
PreviousPage 9 of 10Next
Latest Large Language Models Research | Large Language Models Papers