Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric

Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric

arXiv — cs.CV•Tuesday, November 25, 2025 at 5:00:00 AM

A new benchmark called Bench-C has been introduced to evaluate the corruption robustness of large vision-language models (LVLMs). This benchmark addresses limitations in existing evaluation methods, such as the prevalence of low-discriminative samples and the inadequacy of accuracy-based metrics in capturing prediction structure degradation. Additionally, the Robustness Alignment Score (RAS) has been proposed to measure shifts in prediction uncertainty and calibration alignment.
The development of Bench-C and RAS is significant as it aims to enhance the assessment of LVLMs' performance under visual corruptions, which is crucial for their deployment in real-world applications. By focusing on discriminative samples, these tools could lead to improved model robustness, ultimately benefiting industries relying on advanced AI technologies for visual understanding and decision-making.
This advancement reflects a growing emphasis on the robustness of AI models against misleading inputs and visual corruptions, paralleling other recent efforts in the field. Various frameworks and benchmarks are emerging to tackle challenges such as hallucinations in LVLMs and the need for effective visual token management, indicating a broader trend towards enhancing the reliability and efficiency of AI systems in complex environments.

— via World Pulse Now AI Editorial System

Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric