Q-Doc: Benchmarking Document Image Quality Assessment Capabilities in Multi-modal Large Language Models

arXiv — cs.CVMonday, November 17, 2025 at 5:00:00 AM
The paper titled 'Q-Doc: Benchmarking Document Image Quality Assessment Capabilities in Multi-modal Large Language Models' explores the underutilized potential of Multi-modal Large Language Models (MLLMs) in Document Image Quality Assessment (DIQA). It introduces a three-tiered evaluation framework that assesses MLLMs' capabilities at coarse, middle, and fine granularity levels. The study reveals that while MLLMs show early DIQA abilities, they face significant limitations, including inconsistent scoring and distortion misidentification.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it