[2503.14953] Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
[Submitted on 19 Mar 2025 (v1), last revised 17 Jul 2025 (this version, v2)] View a PDF of the paper titled Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching, by Yang Liu and 4 other authors View PDF HTML (experimental) Abstract:Enabling Visual Semantic Models to effectively handle multi-view description […]