Q-Doc: Benchmarking Document Image Quality Assessment Capabilities in Multi-modal Large Language Models
NeutralArtificial Intelligence
The paper titled 'Q-Doc: Benchmarking Document Image Quality Assessment Capabilities in Multi-modal Large Language Models' explores the underutilized potential of Multi-modal Large Language Models (MLLMs) in Document Image Quality Assessment (DIQA). It introduces a three-tiered evaluation framework that assesses MLLMs' capabilities at coarse, middle, and fine granularity levels. The study reveals that while MLLMs show early DIQA abilities, they face significant limitations, including inconsistent scoring and distortion misidentification.
— via World Pulse Now AI Editorial System