CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models

arXiv — cs.CVThursday, November 13, 2025 at 5:00:00 AM
The introduction of the CHOICE benchmark marks a significant advancement in the evaluation of Large Vision-Language Models (VLMs) specifically for remote sensing applications. As these models have shown remarkable capabilities in Earth observation, the absence of a systematic evaluation framework has been a notable gap. CHOICE aims to bridge this gap by providing a comprehensive assessment tool that includes 10,507 problems derived from data collected across 50 globally distributed cities. This benchmark categorizes capabilities into primary dimensions of perception and reasoning, along with secondary dimensions and leaf tasks, ensuring a thorough evaluation. The evaluation of 3 proprietary and 21 open-source VLMs revealed critical limitations, emphasizing the need for further development in this area. By offering a structured approach to assess VLMs, CHOICE is positioned to serve as a valuable resource, providing insights into the challenges and potential of these models in the field …
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Frequency-Aware Vision-Language Multimodality Generalization Network for Remote Sensing Image Classification
PositiveArtificial Intelligence
The article discusses the development of a frequency-aware vision-language multimodality generalization network (FVMGN) aimed at improving remote sensing image classification. This new approach addresses the challenges posed by data heterogeneity and the limitations of existing vision-language models (VLMs), which often rely on universal texts that do not account for specific linguistic knowledge related to different remote sensing modalities. The proposed method includes a diffusion-based training-test-time augmentation strategy and a multimodal wavelet disentanglement module.
Curing Semantic Drift: A Dynamic Approach to Grounding Generation in Large Vision-Language Models
PositiveArtificial Intelligence
Large Vision-Language Models (LVLMs) often experience 'semantic drift', a phenomenon where they progressively detach from visual input, leading to hallucinations. Current training-free decoding strategies have limitations, including high computational costs and reliance on unreliable proxies. The introduction of Dynamic Logits Calibration (DLC) offers a novel, efficient solution to this issue. DLC operates in real-time, performing visual alignment checks to ensure that the generated outputs remain grounded in visual evidence.
Draft and Refine with Visual Experts
PositiveArtificial Intelligence
Recent advancements in Large Vision-Language Models (LVLMs) reveal their strong multimodal reasoning capabilities. However, these models often generate ungrounded or hallucinated responses due to an overreliance on linguistic priors rather than visual evidence. To address this issue, a new framework called Draft and Refine (DnR) has been proposed, which utilizes a question-conditioned metric to quantify the model's reliance on visual information, enhancing the accuracy and reliability of responses.
Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification
PositiveArtificial Intelligence
The paper presents a new framework called Missing-aware Mixture-of-Loras (MaMOL) aimed at improving multimodal classification in remote sensing, which often faces challenges due to missing modalities from environmental factors or sensor failures. The proposed method reformulates the issue as a multi-task learning problem, utilizing a dual-routing mechanism to enhance adaptability and knowledge sharing among experts. This approach promises to improve classification performance significantly by addressing the limitations of existing methods that rely on complete multimodal data during training.
PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models
PositiveArtificial Intelligence
Large vision-language models (LVLMs) are increasingly recognized for their capabilities, but they face challenges due to object hallucinations. This study reveals that LVLMs often disregard the actual image and instead depend on previously generated output tokens to predict new objects. The research quantifies this behavior by analyzing the mutual information between the image and the predicted object, highlighting a strong correlation between weak image dependence and hallucination. The authors introduce the Prelim Attention Score (PAS), a novel, lightweight metric that can detect object hallucinations effectively without additional training.
Large language models in materials science and the need for open-source approaches
PositiveArtificial Intelligence
Large language models (LLMs) are significantly impacting materials science by enhancing the materials discovery pipeline. This review focuses on three main applications: mining scientific literature, predictive modeling, and multi-agent experimental systems. LLMs are capable of extracting synthesis conditions from texts, learning structure-property relationships, and coordinating systems that integrate computational tools with laboratory automation. The review advocates for the adoption of open-source models, which can match the performance of closed-source alternatives while providing transparency, reproducibility, cost-effectiveness, and data privacy.