OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Letian Zhang, Sucheng Ren, Yanqing Liu +9 more
This paper presents a family of advanced vision encoder, named OpenVision 3, that learns a single, unified visual representation that can serve both image understanding and image generation. Our core architecture is simple: we feed VAE-compressed image latents to a ViT encoder and train its output t...