ANTS: Adaptive Negative Textual Space Shaping for OOD Detection via Test-Time MLLM Understanding and Reasoning

arXiv — cs.CVThursday, November 20, 2025 at 5:00:00 AM
  • ANTS introduces a novel approach to improve Out
  • This development is significant as it enhances the accuracy of OOD detection methods, addressing the challenges posed by existing techniques that struggle with understanding OOD images and constructing accurate negative spaces.
  • The advancement aligns with ongoing efforts in the AI field to mitigate biases and improve model reliability, as seen in related works focusing on visual bias mitigation and the efficiency of multimodal models.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Learning to Expand Images for Efficient Visual Autoregressive Modeling
PositiveArtificial Intelligence
The paper introduces Expanding Autoregressive Representation (EAR), a new paradigm for visual generation that mimics the human visual system's center-outward perception. This method improves efficiency by unfolding image tokens in a spiral order, allowing for parallel decoding and preserving spatial continuity. Additionally, a length-adaptive decoding strategy is proposed to enhance flexibility and speed, ultimately reducing computational costs and improving generation quality.
Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models
PositiveArtificial Intelligence
Multimodal Large Language Models (MLLMs) have shown remarkable capabilities in tasks such as OCR and VQA, but hallucination remains a significant challenge. This paper is the first to explore verb hallucination in MLLMs, revealing that many state-of-the-art models exhibit severe issues with verb concepts. The study evaluates existing methods aimed at reducing hallucinations related to object concepts and assesses their effectiveness on verb hallucinations.
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
PositiveArtificial Intelligence
A comprehensive study on visual token redundancy in discrete diffusion-based multimodal large language models (dMLLMs) has been conducted, revealing significant computational overhead during inference due to full-sequence attention. The research highlights that visual redundancy primarily occurs in from-scratch dMLLMs when addressing long-answer tasks and examines the impact of visual token pruning on model efficiency and responses.
InvFusion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems
PositiveArtificial Intelligence
InvFusion is a novel approach that integrates supervised and zero-shot diffusion methods for solving inverse problems. It addresses the limitations of existing models by providing a degradation-aware posterior sampler that enhances accuracy while maintaining flexibility. This innovation is significant as it combines the strengths of both training-based and zero-shot techniques, marking a step forward in the application of diffusion models in various fields.
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
PositiveArtificial Intelligence
The paper introduces MoDES, a novel framework designed to enhance the efficiency of Mixture-of-Experts (MoE) Multimodal Large Language Models (MLLMs) by implementing dynamic expert skipping. Traditional expert skipping methods, originally intended for unimodal models, lead to performance degradation in MLLMs due to their unique characteristics. MoDES aims to address these inefficiencies without requiring additional training, utilizing a globally-modulated local gating mechanism for improved inference.
OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs
PositiveArtificial Intelligence
OmniSparse introduces a training-aware fine-grained sparse attention framework aimed at enhancing the performance of long-video multimodal large language models (MLLMs). Unlike existing methods that focus on inference-time acceleration, OmniSparse operates effectively during both training and inference by dynamically allocating token budgets. This approach includes mechanisms for query selection and key-value selection, addressing the limitations of traditional sparse attention methods.
FGM-HD: Boosting Generation Diversity of Fractal Generative Models through Hausdorff Dimension Induction
PositiveArtificial Intelligence
The article discusses a novel approach to enhancing the diversity of outputs in Fractal Generative Models (FGMs) while maintaining high visual quality. By incorporating the Hausdorff Dimension (HD), a concept from fractal geometry that quantifies structural complexity, the authors propose a learnable HD estimation method that predicts HD from image embeddings. This method aims to improve the diversity of generated images, addressing challenges such as image quality degradation and limited diversity enhancement in FGMs.
MAVias: Mitigate any Visual Bias
PositiveArtificial Intelligence
MAVias is an innovative approach aimed at mitigating biases in computer vision models, which is crucial for enhancing the trustworthiness of artificial intelligence systems. Traditional bias mitigation techniques often address a limited range of predefined biases, which restricts their effectiveness in diverse visual datasets that may contain multiple, unknown biases. MAVias utilizes foundation models to identify spurious associations between visual attributes and target classes, capturing a broad spectrum of visual features and translating them into language-coded potential biases for further…