Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input
Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input
Res-Bench is a newly introduced benchmark aimed at evaluating the robustness of multimodal large language models (MLLMs) when processing images of varying resolutions. It comprises 14,400 samples distributed across 12 different resolution levels, providing a comprehensive dataset for testing. Unlike existing assessments that primarily focus on semantic performance, Res-Bench emphasizes maintaining model stability regardless of input resolution. This approach addresses a notable gap in current evaluation methods by ensuring that MLLMs perform consistently even as image quality changes. The benchmark was detailed in a recent arXiv publication within the computer vision domain. By focusing on dynamic resolution input, Res-Bench offers a valuable tool for researchers seeking to improve the reliability of multimodal models in real-world applications where image resolution can vary significantly.
