AI Agents

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

JJames BurgessJJan N. HansenDDuo PengYYuhui ZhangAAlejandro LozanoMMin Woo SunEEmma LundbergSSerena Yeung-Levy
Published
January 26, 2026
Authors
8
Word Count
2,562
Code
Includes code

Revolutionizing scientific research with AI-powered search agents.

Abstract

Search agents are language models (LMs) that reason and search knowledge bases (or the web) to answer questions; recent methods supervise only the final answer accuracy using reinforcement learning with verifiable rewards (RLVR). Most RLVR search agents tackle general-domain QA, which limits their relevance to technical AI systems in science, engineering, and medicine. In this work we propose training agents to search and reason over scientific papers -- this tests technical question-answering, it is directly relevant to real scientists, and the capabilities will be crucial to future AI Scientist systems. Concretely, we release a search corpus of 16 million biomedical paper abstracts and construct a challenging factoid QA dataset called PaperSearchQA with 60k samples answerable from the corpus, along with benchmarks. We train search agents in this environment to outperform non-RL retrieval baselines; we also perform further quantitative analysis and observe interesting agent behaviors like planning, reasoning, and self-verification. Our corpus, datasets, and benchmarks are usable with the popular Search-R1 codebase for RLVR training and released on https://huggingface.co/collections/jmhb/papersearchqa. Finally, our data creation methods are scalable and easily extendable to other scientific domains.

Key Takeaways

  • 1

    Introduces new environment for training scientific search agents.

  • 2

    Utilizes reinforcement learning with verifiable rewards (RLVR).

  • 3

    Aims to enhance efficiency and effectiveness of scientific research.

Limitations

  • Currently focused on biomedical paper abstracts.

  • Relies on a specific dataset of factoid questions.

Keywords

language modelsreinforcement learning with verifiable rewardssearch agentsknowledge basesscientific papersbiomedical paper abstractsfactoid QAPaperSearchQARLVRSearch-R1

More in AI Agents

View all
PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR | Paperchime