Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs

arXiv — cs.CVThursday, November 13, 2025 at 5:00:00 AM
The recent publication of the Owl framework marks a significant step in mitigating object hallucination in Large Vision-Language Models (LVLMs), a persistent challenge where AI outputs diverge from visual inputs. The introduction of the Visual-to-Textual Attention Contribution Ratio (VTACR) serves as a novel metric to assess and address the imbalance between visual and textual attention during model decoding. Research indicates that hallucinations are more likely in low-VTACR scenarios, where textual information overshadows visual grounding. Owl's innovative approach includes a fine-grained attention intervention mechanism that dynamically adjusts attention based on VTACR signals, alongside a dual-path contrastive decoding strategy that emphasizes both visually grounded predictions and those that may be hallucinated. This framework not only sets a new standard for reducing hallucinations but also enhances the overall faithfulness of LVLMs, making it a pivotal development in the field o…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it