Revisiting MLLM Based Image Quality Assessment: Errors and Remedy

arXiv — cs.CVWednesday, November 12, 2025 at 5:00:00 AM
The rapid advancement of multi-modal large language models (MLLMs) has significantly impacted image quality assessment (IQA), yet challenges persist due to the mismatch between discrete token outputs and the continuous quality scores needed for effective evaluation. Previous methods converting these outputs often resulted in errors, limiting the performance of MLLM-based IQA. To address this, the new framework Q-Scorer has been proposed, which integrates a lightweight regression module and IQA-specific score tokens into the MLLM pipeline. Extensive experiments have shown that Q-Scorer achieves state-of-the-art performance across multiple IQA benchmarks, demonstrating its ability to generalize well to mixed datasets. This development is crucial as it not only resolves existing issues but also enhances the overall effectiveness of MLLMs in various AI applications, paving the way for improved image quality assessments.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model
PositiveArtificial Intelligence
MicroVQA++ is a newly introduced high-quality microscopy reasoning dataset designed for multimodal large language models (MLLMs). It is derived from the BIOMEDICA archive and consists of a three-stage process that includes expert-validated figure-caption pairs, a novel heterogeneous graph for filtering inconsistent samples, and human-checked multiple-choice questions. This dataset aims to enhance scientific reasoning in biomedical imaging, addressing the current limitations due to the lack of large-scale training data.
Sat2RealCity: Geometry-Aware and Appearance-Controllable 3D Urban Generation from Satellite Imagery
PositiveArtificial Intelligence
The paper titled 'Sat2RealCity: Geometry-Aware and Appearance-Controllable 3D Urban Generation from Satellite Imagery' presents a novel framework for generating 3D urban environments using real-world satellite images. This approach addresses significant challenges in existing methods, such as the need for extensive 3D city assets and the limitations of semantic or height maps. By focusing on individual building entities, Sat2RealCity enhances realism and generalizability in urban modeling.