PISA-Bench: The PISA Index as a Multilingual and Multimodal Metric for the Evaluation of Vision-Language Models
NeutralArtificial Intelligence
PISA-Bench has been introduced as a multilingual benchmark for evaluating vision-language models (VLMs), derived from the expert-created PISA tests. This initiative addresses the shortcomings of current datasets, which often lack high-quality, human-verified examples and are primarily in English. By translating the PISA test examples into five additional languages—Spanish, German, Chinese, French, and Italian—PISA-Bench creates a fully parallel corpus that enhances the evaluation of VLMs across diverse languages. Initial evaluations reveal that smaller models, particularly those with fewer than 20 billion parameters, fail to achieve high scores, indicating a substantial performance degradation on non-English splits. This highlights the need for improved resources in multilingual multimodal reasoning, paving the way for future advancements in AI research.
— via World Pulse Now AI Editorial System
