Latest Large Language Models Research Papers

Research on large language models including GPT, Claude, Llama, and other transformer-based architectures for natural language understanding and generation.

189 Papers

Showing 20 of 20 papers

Bielik Guard: Efficient Polish Language Safety Classifiers for LLM Content Moderation

Krzysztof Wróbel, Jan Maria Kowalski, Jerzy Surma +2 more

As Large Language Models (LLMs) become increasingly deployed in Polish language applications, the need for efficient and accurate content safety classifiers has become paramount. We present Bielik Guard, a family of compact Polish language safety classifiers comprising two model variants: a 0.1B par...

Large Language Modelscontent safety classifiersMMLW-RoBERTa-basePKOBP/polish-roberta-8kfine-tuned models+3 more

Feb 8, 20264

MemFly: On-the-Fly Memory Optimization via Information Bottleneck

Zhenyuan Zhang, Xianzhang Jia, Zhiqin Yang +4 more

Long-term memory enables large language model agents to tackle complex tasks through historical interactions. However, existing frameworks encounter a fundamental dilemma between compressing redundant information efficiently and maintaining precise retrieval for downstream tasks. To bridge this gap,...

information bottlenecklong-term memorylanguage model agentsmemory coherenceresponse fidelity+5 more

Feb 8, 20267

Free(): Learning to Forget in Malloc-Only Reasoning Models

Yilun Zheng, Dongyang Ma, Tian Liang +5 more

Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously acc...

LLMsreasoning modelstest-time computethinking tokensmalloc-only engines+8 more

Feb 8, 20265

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Guijin Son, Donghun Yang, Hitesh Laxmichand Patel +5 more

Recent progress in reasoning models suggests that generating plausible attempts for research-level mathematics may be within reach, but verification remains a bottleneck, consuming scarce expert time. We hypothesize that a meaningful solution should contain enough method-level information that, when...

reasoning modelsresearch-level mathematicsverificationoracle-free evaluatorin-context exemplar+6 more

Feb 6, 202614

QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining

Jun Han, Shuo Zhang, Wei Li +21 more

Financial markets are noisy and non-stationary, making alpha mining highly sensitive to noise in backtesting results and sudden market regime shifts. While recent agentic frameworks improve alpha mining automation, they often lack controllable multi-round search and reliable reuse of validated exper...

Feb 6, 202676

Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

Baichuan-M3 Team, Chengfeng Dou, Fan Yang +15 more

We introduce Baichuan-M3, a medical-enhanced large language model engineered to shift the paradigm from passive question-answering to active, clinical-grade decision support. Addressing the limitations of existing systems in open-ended consultations, Baichuan-M3 utilizes a specialized training pipel...

large language modelclinical decision supportproactive information acquisitionlong-horizon reasoninghallucination suppression+3 more

Feb 6, 202653

LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

Xinwu Ye, Yicheng Mao, Jia Zhang +16 more

Chemical large language models (LLMs) predominantly rely on explicit Chain-of-Thought (CoT) in natural language to perform complex reasoning. However, chemical reasoning is inherently continuous and structural, and forcing it into discrete linguistic tokens introduces a fundamental representation mi...

chemical large language modelsChain-of-Thoughtlatent reasoningcontinuous latent spacetextual generation+3 more

Feb 6, 202615

CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs

Haoran Li, Sucheng Ren, Alan Yuille +1 more

Rotary Positional Embedding (RoPE) is a key component of context scaling in Large Language Models (LLMs). While various methods have been proposed to adapt RoPE to longer contexts, their guiding principles generally fall into two categories: (1) out-of-distribution (OOD) mitigation, which scales RoP...

Rotary Positional Embeddingcontext scalingLarge Language Modelsout-of-distribution mitigationSemantic Modeling+4 more

Feb 5, 20266

DFlash: Block Diffusion for Flash Speculative Decoding

Jian Chen, Yesheng Liang, Zhijian Liu

Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast draft model whose outputs are verified in parallel by the targ...

autoregressive large language modelsspeculative decodingdiffusion LLMsblock diffusion modelparallel generation+5 more

Feb 5, 202635

Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better

Ji Zhao, Yufei Gu, Shitong Shao +3 more

As Large Language Models (LLMs) achieve remarkable empirical success through scaling model and data size, pretraining has become increasingly critical yet computationally prohibitive, hindering rapid development. Despite the availability of numerous pretrained LLMs developed at significant computati...

Large Language Modelspretraininglate-to-early traininglate-to-early-step learninglate-to-early-layer learning+4 more

Feb 5, 20264

Multi-Task GRPO: Reliable LLM Reasoning Across Tasks

Shyam Sundhar Ramesh, Xiaotong Ji, Matthieu Zimmer +5 more

RL-based post-training with GRPO is widely used to improve large language models on individual reasoning tasks. However, real-world deployment requires reliable performance across diverse tasks. A straightforward multi-task adaptation of GRPO often leads to imbalanced outcomes, with some tasks domin...

GRPOmulti-task adaptationworst-task performancepolicy gradientsratio-preserving sampler+3 more

Feb 5, 20267

Large Language Model Reasoning Failures

Peiyang Song, Pengrui Han, Noah Goodman

Large Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. Despite these advances, significant reasoning failures persist, occurring even in seemingly simple scenarios. To systematically understand and address these short...

large language modelsreasoning capabilitiesreasoning failuresembodied reasoningnon-embodied reasoning+5 more

Feb 5, 20267

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Shaobo Wang, Xuan Ouyang, Tianyi Xu +9 more

As high-quality public text approaches exhaustion, a phenomenon known as the Data Wall, pre-training is shifting from more tokens to better tokens. However, existing methods either rely on heuristic static filters that ignore training dynamics, or use dynamic yet optimizer-agnostic criteria based on...

data selectionoptimizer-induced update spaceeffective updatesstable in-distribution proxyGhost technique+8 more

Feb 5, 2026260

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

Fangzhi Xu, Hang Yan, Qiushi Sun +16 more

The rapid advancement of Large Language Models (LLMs) has catalyzed the development of autonomous agents capable of navigating complex environments. However, existing evaluations primarily adopt a deductive paradigm, where agents execute tasks based on explicitly provided rules and static goals, oft...

Large Language Modelsautonomous agentsinductive reasoninglong-horizon planningagent evaluation+4 more

Feb 5, 202649

Privileged Information Distillation for Language Models

Emiliano Penaloza, Dheeraj Vattikonda, Nicolas Gontier +3 more

Training-time privileged information (PI) can enable language models to succeed on tasks they would otherwise fail, making it a powerful tool for reinforcement learning in hard, long-horizon settings. However, transferring capabilities learned with PI to policies that must act without it at inferenc...

privileged informationreinforcement learningdistillationteacher-student objectiveon-policy self-distillation+2 more

Feb 4, 202622

Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning

Yu-Ang Lee, Ching-Yun Ko, Pin-Yu Chen +1 more

Low-Rank Adaptation (LoRA) is the prevailing approach for efficient large language model (LLM) fine-tuning. Building on this paradigm, recent studies have proposed alternative initialization strategies and architectural modifications, reporting substantial improvements over vanilla LoRA. However, th...

Low-Rank Adaptationlarge language modelsfine-tuninghyperparameter searchlearning rate+1 more

Feb 4, 20263

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

Sidi Lu, Zhenwen Liang, Dongyang Ma +3 more

In this paper, we aim to bridge test-time-training with a new type of parametric memory that can be flexibly offloaded from or merged into model parameters. We present Locas, a Locally-Supported parametric memory that shares the design of FFN blocks in modern transformers, allowing it to be flexibly...

parametric memorytransformerFFN blockscontinual learninglow-rank sideway-FFN-style memories+5 more

Feb 4, 20264

Horizon-LM: A RAM-Centric Architecture for LLM Training

Zhengqing Yuan, Lichao Sun, Yanfang +1 more

The rapid growth of large language models (LLMs) has outpaced the evolution of single-GPU hardware, making model scale increasingly constrained by memory capacity rather than computation. While modern training systems extend GPU memory through distributed parallelism and offloading across CPU and st...

large language modelsdistributed parallelismoffloadingGPU-centric executionautograd graphs+6 more

Feb 4, 202610

Rethinking the Trust Region in LLM Reinforcement Learning

Penghui Qi, Xiangxin Zhou, Zichen Liu +4 more

Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiquity, we argue that the core ratio clipping mechanism in PPO is structurally ill-suited for the large...

Proximal Policy Optimizationpolicy divergencepolicy updatestoken probability ratiosMonte Carlo estimates+6 more