Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
Xiaomin Yu, Yi Xin, Wenjie Zhang +12 more
Despite the success of multimodal contrastive learning in aligning visual and linguistic representations, a persistent geometric anomaly, the Modality Gap, remains: embeddings of distinct modalities expressing identical semantics occupy systematically offset regions. Prior approaches to bridge this ...