FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Alignment
PositiveArtificial Intelligence
FALCON, or False-Negative Aware Learning of Contrastive Negatives, addresses a critical challenge in vision-language pretraining (VLP) by tackling the issue of false negatives that arise from the many-to-many correspondence between images and texts. These false negatives can lead to conflicting supervision signals, which degrade the learned embedding space. The proposed method employs a negative mining scheduler that adaptively selects negative samples of appropriate hardness for each anchor instance during mini-batch construction. This innovative approach has demonstrated significant performance improvements across three vision-language learning frameworks—ALBEF, BLIP-2, and SigLIP-2—along with a broad range of downstream tasks. The effectiveness of FALCON underscores its robustness in enhancing cross-modal alignment, making it a valuable contribution to the field of AI and machine learning.
— via World Pulse Now AI Editorial System