GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
GigaBrain Team, Boyuan Wang, Chaojun Ni +22 more
Vision-language-action (VLA) models that directly predict multi-step action chunks from current observations face inherent limitations due to constrained scene understanding and weak future anticipation capabilities. In contrast, video world models pre-trained on web-scale video corpora exhibit robu...