"It's trained by non-disabled people": Evaluating How Image Quality Affects Product Captioning with VLMs

arXiv — cs.CVThursday, November 13, 2025 at 5:00:00 AM
The recent study on Vision-Language Models (VLMs) reveals significant insights into their effectiveness for blind and low-vision (BLV) individuals. By surveying 86 BLV participants, researchers discovered that while VLMs can achieve a remarkable 98% accuracy in recognizing products from high-quality images, this accuracy plummets to 75% when images suffer from common quality issues like blur or misframing. This decline in performance underscores the critical need for VLM evaluations that prioritize the experiences of disabled users. As VLMs become more prevalent in assisting BLV individuals with everyday tasks, ensuring their reliability through user-centered evaluations is essential. The study advocates for concrete recommendations for researchers in human-computer interaction (HCI) and machine learning (ML) to enhance the functionality of these models, ultimately aiming to better serve the information needs of BLV people.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it