Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

CChen YangGGuangyue PengJJiaying ZhuRRan LeRRuixiang FengTTao ZhangXXiyun XuYYang SongYYiming JiaYYuntao WenYYunzhi XuZZekai WangZZhenwei AnZZhicong SunZZongchao Chen

Published: February 13, 2026
Authors: 15
Word Count: 7,209
Code: Includes code

View on arXiv Download PDF

Nanbeige4.1-3B achieves reasoning, coding, and autonomy in just 3 billion parameters.

Abstract

We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nanbeige4-3B-2511 and Qwen3-4B, even achieving superior performance compared to much larger models, such as Qwen3-30B-A3B. Our results demonstrate that small models can achieve both broad competence and strong specialization simultaneously, redefining the potential of 3B parameter models.

Key Takeaways

1
Nanbeige4.1-3B successfully combines reasoning, code generation, and agentic behavior in a single 3B parameter model.
2
Extended context length to 256k tokens and enhanced chain-of-thought reconstruction significantly improved reasoning capabilities.
3
Two-stage reinforcement learning using both point-wise and pair-wise rewards captures different aspects of response quality.

Limitations

Previous small models forced users to choose between reasoning, code generation, or agentic capabilities.
Most models struggle with long-horizon planning and maintaining meaningful tool interactions beyond a few steps.

Keywords

unified generalist language modelreward modelingreinforcement learningtool-call turnsdeep searchcomplex data synthesisturn-level supervisionpoint-wise reward modelingpair-wise reward modelingcode generationgeneral reasoningagentic behaviorhuman-aligned responsesmodel optimization

More in AI Agents

View all

LongCat-Flash-Thinking-2601 Technical Report

Meituan LongCat Team, Anchun Gui +160

We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves ...

Jan 23149

Agentic Reasoning for Large Language Models

Tianxin Wei, Ting-Wei Li +27

Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world se...

Jan 18149

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Ailin Huang, Ang Li +213

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: ...

Feb 11140

UI-Venus-1.5 Technical Report

Veuns-Team, Changlong Gao +25

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In ...

Feb 9140

daVinci-Dev: Agent-native Mid-training for Software Engineering

Ji Zeng, Dayuan Fu +15

Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and ...

Jan 26113

More AI Agents papers