DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents
Nikita Gupta, Riju Chatterjee, Lukas Haas +9 more
We introduce DeepSearchQA, a 900-prompt benchmark for evaluating agents on difficult multi-step information-seeking tasks across 17 different fields. Unlike traditional benchmarks that target single answer retrieval or broad-spectrum factuality, DeepSearchQA features a dataset of challenging, handcr...