Multimodal AI

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

TTingyu WuZZhisheng ChenZZiyan WengSShuhe WangCChenglong LiSShuo ZhangSSen HuSSilin WuQQizhen LanHHuacan WangRRonghao Chen
arXiv ID
2601.04745
Published
January 8, 2026
Authors
11
Hugging Face Likes
48
Comments
2

Abstract

Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in KnowMeBench{https://github.com/QuantaAlpha/KnowMeBench}.

More in Multimodal AI

View all
KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions | Paperchime