Latest Multimodal AI Research Papers

Research on AI systems that process multiple types of data including vision-language models and cross-modal understanding.

48 Papers
Showing 8 of 8 papers

More Images, More Problems? A Controlled Analysis of VLM Failure Modes

Anurag Das, Adrian Bulat, Alberto Baldrati +4 more

Large Vision Language Models (LVLMs) have demonstrated remarkable capabilities, yet their proficiency in understanding and reasoning over multiple images remains largely unexplored. While existing benchmarks have initiated the evaluation of multi-image models, a comprehensive analysis of their core ...

Large Vision Language Modelsmulti-image capabilitiesbenchmarkdiagnostic experimentscross-image aggregation+2 more
Jan 12, 20265
PreviousPage 3 of 3