Trending:

PISA-Bench: The PISA Index as a Multilingual and Multimodal Metric for the Evaluation of Vision-Language Models

arXiv — cs.CV•Thursday, November 13, 2025 at 5:00:00 AM

PISA-Bench has been introduced as a multilingual benchmark for evaluating vision-language models (VLMs), derived from the expert-created PISA tests. This initiative addresses the shortcomings of current datasets, which often lack high-quality, human-verified examples and are primarily in English. By translating the PISA test examples into five additional languages—Spanish, German, Chinese, French, and Italian—PISA-Bench creates a fully parallel corpus that enhances the evaluation of VLMs across diverse languages. Initial evaluations reveal that smaller models, particularly those with fewer than 20 billion parameters, fail to achieve high scores, indicating a substantial performance degradation on non-English splits. This highlights the need for improved resources in multilingual multimodal reasoning, paving the way for future advancements in AI research.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

AIPortalX

Browse, compare, and use over 100 verified AI models with detailed insights and filtering.

Creative & DesignView app details

OpenL Translator

Instantly translate text from images of signs and menus with accuracy.

AI & DataView app details

Pizi AI

Upload your exam and get instant AI-powered solutions with step-by-step explanations.

AI & DataView app details

ModelsLab

Access over 100,000 AI models through a unified API platform.

Business & ProductivityView app details

Palteca

Master a new language with AI-driven lessons based on proven learning methods.

Lifestyle & HealthView app details

Continue Readings

arXiv — cs.CV2 days ago

Cascading multi-agent anomaly detection in surveillance systems via vision-language models and embedding-based classification

PositiveArtificial Intelligence

A new framework for cascading multi-agent anomaly detection in surveillance systems has been introduced, utilizing vision-language models and embedding-based classification to enhance real-time performance and semantic interpretability. This approach integrates various methodologies, including reconstruction-gated filtering and object-level assessments, to address the complexities of detecting anomalies in dynamic visual environments.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

VMMU: A Vietnamese Multitask Multimodal Understanding and Reasoning Benchmark

NeutralArtificial Intelligence

The introduction of VMMU, a Vietnamese Multitask Multimodal Understanding and Reasoning Benchmark, aims to assess the capabilities of vision-language models (VLMs) in interpreting and reasoning over visual and textual information in Vietnamese. This benchmark includes 2.5k multimodal questions across seven diverse tasks, emphasizing genuine multimodal integration rather than text-only cues.

Read full article

via arXiv — cs.LG

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about