World PulseNowPowered by AI

Trending:

Intelligent Image Search Algorithms Fusing Visual Large Models

arXiv — cs.CV•Wednesday, November 26, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new framework called DetVLM has been proposed to enhance fine-grained image retrieval by integrating object detection with Visual Large Models (VLMs). This two-stage pipeline utilizes a YOLO detector for efficient component-level screening, addressing limitations in conventional methods that struggle with state-specific retrieval and zero-shot search capabilities.
The introduction of DetVLM is significant as it aims to improve the accuracy and efficiency of image retrieval in critical fields such as security and industrial inspection, where precise identification of object components and their states is essential.
This development reflects a broader trend in artificial intelligence where the fusion of different model types, such as YOLO and VLMs, is increasingly seen as a solution to enhance performance. The ongoing evolution of object detection frameworks and their applications in various domains, including fashion and anomaly detection, highlights the importance of integrating advanced technologies to meet complex retrieval challenges.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

Continue Readings

IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding

arXiv — cs.CV2 days ago

IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding

NeutralArtificial Intelligence

A novel vulnerability in vision-language models (VLMs) has been identified through the introduction of IAG, a method that enables multi-target backdoor attacks on VLM-based visual grounding systems. This technique utilizes dynamically generated, input-aware triggers that are text-guided, allowing for imperceptible manipulation of visual inputs while maintaining normal performance on benign samples.

Read full article

via arXiv — cs.CV

TRANSPORTER: Transferring Visual Semantics from VLM Manifolds

arXiv — cs.CV2 days ago

TRANSPORTER: Transferring Visual Semantics from VLM Manifolds

PositiveArtificial Intelligence

The paper introduces TRANSPORTER, a model-independent approach designed to enhance video generation by transferring visual semantics from Vision Language Models (VLMs). This method addresses the challenge of understanding how VLMs derive their predictions, particularly in complex scenes with various objects and actions. TRANSPORTER generates videos that reflect changes in captions across diverse attributes and contexts.

Read full article

via arXiv — cs.CV

Zero-Reference Joint Low-Light Enhancement and Deblurring via Visual Autoregressive Modeling with VLM-Derived Modulation

arXiv — cs.CV2 days ago

Zero-Reference Joint Low-Light Enhancement and Deblurring via Visual Autoregressive Modeling with VLM-Derived Modulation

PositiveArtificial Intelligence

A new generative framework has been proposed for enhancing low-light images and reducing blur, utilizing visual autoregressive modeling guided by perceptual priors from vision-language models. This approach addresses significant challenges in restoring dark images, which often suffer from low visibility, contrast, noise, and blur.

Read full article

via arXiv — cs.CV

Peregrine: One-Shot Fine-Tuning for FHE Inference of General Deep CNNs

arXiv — cs.CV2 days ago

Peregrine: One-Shot Fine-Tuning for FHE Inference of General Deep CNNs

PositiveArtificial Intelligence

The recent paper titled 'Peregrine: One-Shot Fine-Tuning for FHE Inference of General Deep CNNs' addresses key challenges in adapting deep convolutional neural networks (CNNs) for fully homomorphic encryption (FHE) inference. It introduces a single-stage fine-tuning strategy and a generalized interleaved packing scheme to enhance the performance of CNNs while maintaining accuracy and supporting high-resolution image processing.

Read full article

via arXiv — cs.CV

DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection

arXiv — cs.CV2 days ago

DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection

PositiveArtificial Intelligence

The introduction of DiffSeg30k marks a significant advancement in the detection of AI-generated content (AIGC) by providing a dataset of 30,000 diffusion-edited images with pixel-level annotations. This dataset allows for fine-grained detection of localized edits, addressing a gap in existing benchmarks that typically assess entire images without considering localized modifications.

Read full article

via arXiv — cs.CV

From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation

arXiv — cs.CV2 days ago

From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation

PositiveArtificial Intelligence

A new framework has been introduced for automatic fashion captioning and hashtag generation, utilizing a retrieval-augmented approach that integrates multi-garment detection, attribute reasoning, and Large Language Model (LLM) prompting. This system aims to produce visually grounded and stylistically engaging text for fashion images, addressing the shortcomings of traditional end-to-end captioners in attribute fidelity and domain generalization.

Read full article

via arXiv — cs.CV

Multimodal Real-Time Anomaly Detection and Industrial Applications

arXiv — cs.CV2 days ago

Multimodal Real-Time Anomaly Detection and Industrial Applications

PositiveArtificial Intelligence

A comprehensive multimodal room-monitoring system has been developed, integrating synchronized video and audio processing for real-time activity recognition and anomaly detection. The system has undergone two iterations, with the advanced version featuring multi-model audio ensembles and hybrid object detection methods, significantly enhancing its accuracy and robustness.

Read full article

via arXiv — cs.CV