DomainCQA: Crafting Knowledge-Intensive QA from Domain-Specific Charts

arXiv — cs.CL•Monday, November 17, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

DomainCQA introduces a new framework for Chart Question Answering that prioritizes visual understanding and knowledge
This development is significant as it aims to improve the performance of MLLMs in complex reasoning tasks, particularly in fields like astronomy, biochemistry, and economics, where deeper understanding is crucial.
While no related articles were identified, the emphasis on enhancing reasoning capabilities in MLLMs aligns with ongoing discussions in AI about the need for more sophisticated evaluation methods.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.LG8 hours ago

Do Large Language Models (LLMs) Understand Chronology?

NeutralArtificial Intelligence

Large language models (LLMs) are increasingly utilized in finance and economics, where their ability to understand chronology is critical. A study tested this capability through various chronological ordering tasks, revealing that while models like GPT-4.1 and GPT-5 can maintain local order, they struggle with creating a consistent global timeline. The findings indicate a significant drop in exact match rates as task complexity increases, particularly in conditional sorting tasks, highlighting inherent limitations in LLMs' chronological reasoning.

Read full article

via arXiv — cs.LG

arXiv — cs.LG8 hours ago

Synthetic Survival Control: Extending Synthetic Controls for "When-If" Decision

PositiveArtificial Intelligence

The article presents Synthetic Survival Control (SSC), a novel method for estimating causal effects on time-to-event outcomes from observational data. SSC addresses challenges such as censoring and non-random treatment assignment, which complicate 'when-if' questions regarding event timing under specific interventions. By utilizing a panel data framework, SSC estimates counterfactual hazard trajectories for units experiencing different treatments over time, offering a weighted combination of observed trajectories from other units.

Read full article

via arXiv — cs.LG

arXiv — cs.CL8 hours ago

Evaluation of OpenAI o1: Opportunities and Challenges of AGI

PositiveArtificial Intelligence

This study evaluates OpenAI's o1-preview large language model, highlighting its performance across various complex reasoning tasks in fields such as computer science, mathematics, and medicine. The model achieved a success rate of 83.3% in competitive programming, excelled in generating radiology reports, and demonstrated 100% accuracy in high school-level math tasks. Its advanced natural language inference capabilities further underscore its potential in diverse applications.

Read full article

via arXiv — cs.CL

arXiv — cs.CV2 days ago

Hindsight Distillation Reasoning with Knowledge Encouragement Preference for Knowledge-based Visual Question Answering

PositiveArtificial Intelligence

The article presents a new framework called Hindsight Distilled Reasoning (HinD) with Knowledge Encouragement Preference Optimization (KEPO) aimed at enhancing Knowledge-based Visual Question Answering (KBVQA). This framework addresses the limitations of existing methods that rely on implicit reasoning in multimodal large language models (MLLMs). By prompting a 7B-size MLLM to complete reasoning processes, the framework aims to improve the integration of external knowledge in visual question answering tasks.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image

PositiveArtificial Intelligence

MOSABench is a newly introduced evaluation dataset aimed at addressing the lack of standardized benchmarks for multi-object sentiment analysis in multimodal large language models (MLLMs). It comprises approximately 1,000 images featuring multiple objects, requiring MLLMs to evaluate the sentiment of each object independently. Key features of MOSABench include distance-based target annotation and an improved scoring mechanism, highlighting current limitations in MLLMs' performance in this complex task.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models

PositiveArtificial Intelligence

The paper introduces AUVIC, a novel framework for adversarial unlearning of visual concepts in Multi-modal Large Language Models (MLLMs). This framework addresses data privacy concerns by enabling the removal of sensitive visual content without the need for extensive retraining. AUVIC utilizes adversarial perturbations to isolate target concepts while maintaining model performance on related entities. The study also presents VCUBench, a benchmark for evaluating the effectiveness of visual concept unlearning.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

AirCopBench: A Benchmark for Multi-drone Collaborative Embodied Perception and Reasoning

NeutralArtificial Intelligence

AirCopBench is a new benchmark introduced to evaluate Multimodal Large Language Models (MLLMs) in multi-drone collaborative perception tasks. It addresses the lack of comprehensive evaluation tools for multi-agent systems, which outperform single-agent setups in terms of coverage and robustness. The benchmark includes over 14,600 questions across various task dimensions, such as Scene Understanding and Object Understanding, designed to assess performance under challenging conditions.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models

PositiveArtificial Intelligence

VP-Bench is a newly introduced benchmark designed to evaluate the ability of multimodal large language models (MLLMs) to interpret visual prompts (VPs) in images. This benchmark addresses a significant gap in existing evaluations, as no systematic assessment of MLLMs' effectiveness in recognizing VPs has been conducted. VP-Bench utilizes a two-stage evaluation framework, involving 30,000 visualized prompts across eight shapes and 355 attribute combinations, to assess MLLMs' capabilities in VP perception and utilization.

Read full article

via arXiv — cs.CV