Benchmarking Visual LLMs Resilience to Unanswerable Questions on Visually Rich Documents

arXiv — cs.CV•Monday, November 17, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

The research focuses on the resilience of Visual Large Language Models (VLLMs) to unanswerable questions in Visually Rich Documents (VRDs), highlighting their strengths in Visual Question Answering (VQA) while addressing a significant gap in their ability to detect unanswerable queries.
This development is crucial as it aims to enhance the robustness of VLLMs, which are increasingly used in applications requiring comprehension of complex documents, thereby improving their reliability in real
Although no related articles were identified, the study's emphasis on benchmarking VLLMs against unanswerable questions reflects a growing trend in AI research to refine model capabilities and address limitations in understanding nuanced queries.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CV4 hours ago

VLMs Guided Interpretable Decision Making for Autonomous Driving

PositiveArtificial Intelligence

Recent advancements in autonomous driving have investigated the application of vision-language models (VLMs) in visual question answering (VQA) frameworks for driving decision-making. However, these methods often rely on handcrafted prompts and exhibit inconsistent performance, which hampers their effectiveness in real-world scenarios. This study assesses state-of-the-art open-source VLMs on high-level decision-making tasks using ego-view visual inputs, revealing significant limitations in their ability to provide reliable, context-aware decisions.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery

PositiveArtificial Intelligence

Geospatial chain of thought (CoT) reasoning is crucial for enhancing Visual Question Answering (VQA) on satellite imagery, especially in climate-related applications like disaster monitoring and urban resilience planning. Current VQA models can interpret remote sensing data but often lack the structured reasoning needed for complex geospatial queries. A new framework integrating CoT reasoning with Direct Preference Optimization (DPO) has been proposed, showing a 34.9% accuracy improvement in handling tasks such as detection and classification.

Read full article

via arXiv — cs.CV