World PulseNowPowered by AI

Trending:

Contrastive vision-language learning with paraphrasing and negation

arXiv — cs.LG•Friday, November 21, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

The study introduces a novel approach to contrastive vision
This development is significant as it seeks to overcome the limitations of current models, which struggle with the nuanced meanings introduced by paraphrasing and negation, thereby potentially leading to more robust vision
The challenges of aligning visual and textual data are part of a broader discourse in AI, where advancements in models like InfoCLIP and QwenCLIP are also exploring innovative solutions to enhance semantic understanding and mitigate issues such as overfitting and catastrophic forgetting.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

sync. labs

Create, reanimate, and understand humans in video with advanced lip-sync technology.

Creative & DesignTry the app

Palteca

Master a new language with AI-driven lessons based on proven learning methods.

Lifestyle & HealthTry the app

MindPrism AI

Analyze your thoughts, detect negative patterns, and rewrite them constructively.

Lifestyle & HealthTry the app

Continue Readings

The Finer the Better: Towards Granular-aware Open-set Domain Generalization

arXiv — cs.CVa day ago

The Finer the Better: Towards Granular-aware Open-set Domain Generalization

PositiveArtificial Intelligence

The recent introduction of the Semantic-enhanced CLIP (SeeCLIP) framework addresses the challenges of Open-Set Domain Generalization (OSDG), particularly the risks associated with distinguishing known and unknown classes in vision-language models. SeeCLIP enhances semantic understanding by decomposing images into detailed semantic tokens, improving model performance in recognizing novel object categories amidst domain shifts.

Read full article

via arXiv — cs.CV

Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models

arXiv — cs.CVa day ago

Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models

PositiveArtificial Intelligence

A novel framework named ReCoVAD has been proposed for video anomaly detection (VAD), inspired by the human nervous system's dual pathways. This framework allows for selective frame processing, significantly reducing computational costs associated with dense frame-level inference. The approach leverages large pre-trained models, enhancing VAD's efficiency in applications such as security surveillance and autonomous driving.

Read full article

via arXiv — cs.CV

SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion

arXiv — cs.CVa day ago

SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion

PositiveArtificial Intelligence

SpatialGeo has been introduced as a novel vision encoder that enhances the spatial reasoning capabilities of multimodal large language models (MLLMs) by integrating geometry and semantics features. This advancement addresses the limitations of existing MLLMs, particularly in interpreting spatial arrangements in three-dimensional space, which has been a significant challenge in the field.

Read full article

via arXiv — cs.CV

ATAC: Augmentation-Based Test-Time Adversarial Correction for CLIP

arXiv — cs.CVa day ago

ATAC: Augmentation-Based Test-Time Adversarial Correction for CLIP

PositiveArtificial Intelligence

A new method called Augmentation-Based Test-Time Adversarial Correction (ATAC) has been proposed to enhance the robustness of the CLIP model against adversarial perturbations in images. This approach operates in the embedding space of CLIP, utilizing augmentation-induced drift vectors to correct embeddings based on angular consistency. The method has shown to outperform previous state-of-the-art techniques by nearly 50% in robustness across various benchmarks.

Read full article

via arXiv — cs.CV

MindShot: A Few-Shot Brain Decoding Framework via Transferring Cross-Subject Prior and Distilling Frequency Domain Knowledge

arXiv — cs.CVa day ago

MindShot: A Few-Shot Brain Decoding Framework via Transferring Cross-Subject Prior and Distilling Frequency Domain Knowledge

PositiveArtificial Intelligence

A new framework named MindShot has been introduced to enhance brain decoding by reconstructing visual stimuli from brain signals, addressing challenges like individual differences and high data collection costs. This two-stage framework includes a Multi-Subject Pretraining (MSP) stage and a Fourier-based cross-subject Knowledge Distillation (FKD) stage, aiming to improve adaptability for clinical applications.

Read full article

via arXiv — cs.CV

SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge

arXiv — cs.LGa day ago

SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge

PositiveArtificial Intelligence

The introduction of SaFeR-CLIP marks a significant advancement in enhancing the safety of vision-language models like CLIP by employing a proximity-aware approach to redirect unsafe concepts to semantically similar safe alternatives. This method minimizes representational changes while improving zero-shot accuracy by up to 8.0% compared to previous techniques.

Read full article

via arXiv — cs.LG