RoboBrain 2.5: Depth in Sight, Time in Mind

HHuajie TanEEnshen ZhouZZhiyu LiYYijie XuYYuheng JiXXiansheng ChenCCheng ChiPPengwei WangHHuizhu JiaYYulong AoMMingyu CaoSSixiang ChenZZhe LiMMengzhen LiuZZixiao WangSShanyu RongYYaoxu LyuZZhongxia ZhaoPPeterson CoYYibo LiYYi HanSShaoxuan XieGGuocai YaoSSongjing WangLLeiduo ZhangXXi YangYYance JiaoDDonghai ShiKKunchang XieSShaokai NieCChunlei MenYYonghua LinZZhongyuan WangTTiejun HuangSShanghang Zhang

Published: January 20, 2026
Authors: 35

View on arXiv Download PDF

Abstract

We introduce RoboBrain 2.5, a next-generation embodied AI foundation model that advances general perception, spatial reasoning, and temporal modeling through extensive training on high-quality spatiotemporal supervision. Building upon its predecessor, RoboBrain 2.5 introduces two major capability upgrades. Specifically, it unlocks Precise 3D Spatial Reasoning by shifting from 2D pixel-relative grounding to depth-aware coordinate prediction and absolute metric constraint comprehension, generating complete 3D manipulation traces as ordered keypoint sequences under physical constraints. Complementing this spatial precision, the model establishes Dense Temporal Value Estimation that provides dense, step-aware progress prediction and execution state understanding across varying viewpoints, producing stable feedback signals for downstream learning. Together, these upgrades extend the framework toward more physically grounded and execution-aware embodied intelligence for complex, fine-grained manipulation. The code and checkpoints are available at project website: https://superrobobrain.github.io

Keywords

embodied AIspatiotemporal supervision3D spatial reasoningdepth-aware coordinate predictionmetric constraint comprehension3D manipulation tracestemporal value estimationstep-aware progress predictionexecution state understanding

More in Robotics & Embodied AI

View all

RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI

Hongzhi Zang, Shu'ang Yu +15

Online policy learning directly in the physical world is a promising yet challenging direction for embodied intelligence. Unlike simulation, real-world systems cannot be arbitrarily accelerated, cheap...

Feb 846

RynnBrain: Open Embodied Foundation Models

Ronghao Dang, Jiayan Guo +24

Despite rapid progress in multimodal foundation models, embodied intelligence community still lacks a unified, physically grounded foundation model that integrates perception, reasoning, and planning ...

Feb 1336

RoboPocket: Improve Robot Policies Instantly with Your Phone

Junjie Fang, Wendi Chen +8

Scaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predo...

Mar 530

SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation

Mu Huang, Hui Wang +6

Simulating deformable objects under rich interactions remains a fundamental challenge for real-to-sim robot manipulation, with dynamics jointly driven by environmental effects and robot actions. Exist...

Feb 228

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

Runpei Dong, Ziyan Li +2

Visual loco-manipulation of arbitrary objects in the wild with humanoid robots requires accurate end-effector (EE) control and a generalizable understanding of the scene via visual inputs (e.g., RGB-D...

Feb 1826

More Robotics & Embodied AI papers