ANTS: Adaptive Negative Textual Space Shaping for OOD Detection via Test-Time MLLM Understanding and Reasoning

arXiv — cs.CV•Thursday, November 20, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

ANTS introduces a novel approach to improve Out
This development is significant as it enhances the accuracy of OOD detection methods, addressing the challenges posed by existing techniques that struggle with understanding OOD images and constructing accurate negative spaces.
The advancement aligns with ongoing efforts in the AI field to mitigate biases and improve model reliability, as seen in related works focusing on visual bias mitigation and the efficiency of multimodal models.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CV8 hours ago

Learning to Expand Images for Efficient Visual Autoregressive Modeling

PositiveArtificial Intelligence

The paper introduces Expanding Autoregressive Representation (EAR), a new paradigm for visual generation that mimics the human visual system's center-outward perception. This method improves efficiency by unfolding image tokens in a spiral order, allowing for parallel decoding and preserving spatial continuity. Additionally, a length-adaptive decoding strategy is proposed to enhance flexibility and speed, ultimately reducing computational costs and improving generation quality.

Read full article

via arXiv — cs.CV

arXiv — cs.CV8 hours ago

Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models

PositiveArtificial Intelligence

Multimodal Large Language Models (MLLMs) have shown remarkable capabilities in tasks such as OCR and VQA, but hallucination remains a significant challenge. This paper is the first to explore verb hallucination in MLLMs, revealing that many state-of-the-art models exhibit severe issues with verb concepts. The study evaluates existing methods aimed at reducing hallucinations related to object concepts and assesses their effectiveness on verb hallucinations.

Read full article

via arXiv — cs.CV

arXiv — cs.CV8 hours ago

A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models

PositiveArtificial Intelligence

A comprehensive study on visual token redundancy in discrete diffusion-based multimodal large language models (dMLLMs) has been conducted, revealing significant computational overhead during inference due to full-sequence attention. The research highlights that visual redundancy primarily occurs in from-scratch dMLLMs when addressing long-answer tasks and examines the impact of visual token pruning on model efficiency and responses.

Read full article

via arXiv — cs.CV

arXiv — cs.CV8 hours ago

InvFusion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems

PositiveArtificial Intelligence

InvFusion is a novel approach that integrates supervised and zero-shot diffusion methods for solving inverse problems. It addresses the limitations of existing models by providing a degradation-aware posterior sampler that enhances accuracy while maintaining flexibility. This innovation is significant as it combines the strengths of both training-based and zero-shot techniques, marking a step forward in the application of diffusion models in various fields.

Read full article

via arXiv — cs.CV

arXiv — cs.CV8 hours ago

MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping

PositiveArtificial Intelligence

The paper introduces MoDES, a novel framework designed to enhance the efficiency of Mixture-of-Experts (MoE) Multimodal Large Language Models (MLLMs) by implementing dynamic expert skipping. Traditional expert skipping methods, originally intended for unimodal models, lead to performance degradation in MLLMs due to their unique characteristics. MoDES aims to address these inefficiencies without requiring additional training, utilizing a globally-modulated local gating mechanism for improved inference.

Read full article

via arXiv — cs.CV

arXiv — cs.CV8 hours ago

OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs

PositiveArtificial Intelligence

OmniSparse introduces a training-aware fine-grained sparse attention framework aimed at enhancing the performance of long-video multimodal large language models (MLLMs). Unlike existing methods that focus on inference-time acceleration, OmniSparse operates effectively during both training and inference by dynamically allocating token budgets. This approach includes mechanisms for query selection and key-value selection, addressing the limitations of traditional sparse attention methods.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

FGM-HD: Boosting Generation Diversity of Fractal Generative Models through Hausdorff Dimension Induction

PositiveArtificial Intelligence

The article discusses a novel approach to enhancing the diversity of outputs in Fractal Generative Models (FGMs) while maintaining high visual quality. By incorporating the Hausdorff Dimension (HD), a concept from fractal geometry that quantifies structural complexity, the authors propose a learnable HD estimation method that predicts HD from image embeddings. This method aims to improve the diversity of generated images, addressing challenges such as image quality degradation and limited diversity enhancement in FGMs.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

MAVias: Mitigate any Visual Bias

PositiveArtificial Intelligence

MAVias is an innovative approach aimed at mitigating biases in computer vision models, which is crucial for enhancing the trustworthiness of artificial intelligence systems. Traditional bias mitigation techniques often address a limited range of predefined biases, which restricts their effectiveness in diverse visual datasets that may contain multiple, unknown biases. MAVias utilizes foundation models to identify spurious associations between visual attributes and target classes, capturing a broad spectrum of visual features and translating them into language-coded potential biases for further…

Read full article

via arXiv — cs.CV