Large Language Models

Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

BBaichuan-M3 TeamCChengfeng DouFFan YangFFei LiJJiyuan JiaQQiang JuSShuai WangTTianpeng LiXXiangrong ZengYYijie ZhouHHongda ZhangJJinyang TaiLLinzhuang SunPPeidong GuoYYichuan MoXXiaochuan WangHHengfu CuiZZhishou Zhang
Published
February 6, 2026
Authors
18
Word Count
11,307

Advanced medical AI for reliable clinical decisions.

Abstract

We introduce Baichuan-M3, a medical-enhanced large language model engineered to shift the paradigm from passive question-answering to active, clinical-grade decision support. Addressing the limitations of existing systems in open-ended consultations, Baichuan-M3 utilizes a specialized training pipeline to model the systematic workflow of a physician. Key capabilities include: (i) proactive information acquisition to resolve ambiguity; (ii) long-horizon reasoning that unifies scattered evidence into coherent diagnoses; and (iii) adaptive hallucination suppression to ensure factual reliability. Empirical evaluations demonstrate that Baichuan-M3 achieves state-of-the-art results on HealthBench, the newly introduced HealthBench-Hallu and ScanBench, significantly outperforming GPT-5.2 in clinical inquiry, advisory and safety. The models are publicly available at https://huggingface.co/collections/baichuan-inc/baichuan-m3.

Key Takeaways

  • 1

    Unifies clinical inquiry with reliable decision-making.

  • 2

    Three-stage training framework optimizes individual competencies.

  • 3

    Achieves state-of-the-art performance on benchmarks.

Limitations

  • Requires extensive training data and supervision.

  • May still struggle with extremely ambiguous cases.

Keywords

large language modelclinical decision supportproactive information acquisitionlong-horizon reasoninghallucination suppressionHealthBenchHealthBench-HalluScanBench

More in Large Language Models

View all
Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making | Paperchime