Can MLLMs Read the Room? A Multimodal Benchmark for Assessing Deception in Multi-Party Social Interactions

arXiv — cs.CV•Friday, November 21, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

A recent study highlights the limitations of Multimodal Large Language Models (MLLMs) in detecting deception during complex social interactions, introducing a new benchmark called MIDA to evaluate their performance.
This development underscores the challenges faced by advanced AI models, particularly in understanding nuanced human communication, which is essential for applications in social robotics and virtual assistants.
The findings reflect ongoing concerns about the reliability of MLLMs, as they often fail to integrate multimodal cues effectively, a challenge echoed in various studies addressing hallucinations and misinformation in AI

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataTry the app

GPTHumanizer

Bypass AI detection with guaranteed undetectable content generation.

AI & DataTry the app

Supametas.AI

Extract and structure unstructured data for seamless LLM RAG integration.

AI & DataTry the app

Continue Readings

arXiv — cs.CLa day ago

Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study

PositiveArtificial Intelligence

A recent study evaluated the performance of various large language models (LLMs) in restoring diacritics in Romanian texts, highlighting the importance of automatic diacritic restoration for effective text processing in languages rich in diacritical marks. Models tested included OpenAI's GPT-3.5, GPT-4, and Google's Gemini 1.0 Pro, among others, with GPT-4o achieving notable accuracy in diacritic restoration.

Read full article

via arXiv — cs.CL

arXiv — cs.CVa day ago

R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios

PositiveArtificial Intelligence

The introduction of R-AVST marks a significant advancement in the field of multimodal large language models (MLLMs), focusing on fine-grained spatio-temporal reasoning in complex audio-visual scenarios. This dataset comprises over 5,000 untrimmed videos annotated with 27,000 objects across 100 types of events, enabling the development of three core tasks for evaluating model performance in audio-visual reasoning.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion

PositiveArtificial Intelligence

SpatialGeo has been introduced as a novel vision encoder that enhances the spatial reasoning capabilities of multimodal large language models (MLLMs) by integrating geometry and semantics features. This advancement addresses the limitations of existing MLLMs, particularly in interpreting spatial arrangements in three-dimensional space, which has been a significant challenge in the field.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

Q-REAL: Towards Realism and Plausibility Evaluation for AI-Generated Content

PositiveArtificial Intelligence

A new dataset named Q-Real has been introduced to evaluate the realism and plausibility of AI-generated images, consisting of 3,088 images annotated for major entities and judgment questions. This initiative aims to enhance the quality assessment of generative models, moving beyond the limitations of existing datasets that provide only a single quality score.

Read full article

via arXiv — cs.CV

arXiv — cs.CLa day ago

Cross-cultural value alignment frameworks for responsible AI governance: Evidence from China-West comparative analysis

NeutralArtificial Intelligence

A recent study has introduced a Multi-Layered Auditing Platform for Responsible AI, aimed at evaluating cross-cultural value alignment in Large Language Models (LLMs) from China and the West. This research highlights the governance challenges posed by LLMs in high-stakes decision-making, revealing fundamental instabilities in value systems and demographic under-representation among leading models like Qwen and GPT-4o.

Read full article

via arXiv — cs.CL

arXiv — cs.CVa day ago

From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models

NeutralArtificial Intelligence

Recent advancements in Multimodal Large Language Models (MLLMs) have highlighted the need to enhance their reasoning capabilities, particularly through the Chain-of-Thought (CoT) paradigm. This approach aims to improve reasoning transparency and interpretability, addressing existing challenges such as opaque reasoning paths and limited generalization abilities. The systematic review of Multimodal Chain-of-Thought (MCoT) methods provides insights into their theoretical foundations and practical applications.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

Revisiting Multimodal KV Cache Compression: A Frequency-Domain-Guided Outlier-KV-Aware Approach

PositiveArtificial Intelligence

A new approach to multimodal KV Cache compression has been proposed, focusing on the distribution of KV matrices' energy in the frequency domain. This method identifies and removes outlier KV pairs that deviate from the principal energy, which significantly impacts the performance of multimodal large language models (MLLMs). The study highlights the limitations of existing compression methods that rely solely on attention scores.

Read full article

via arXiv — cs.LG

VentureBeat — AIa day ago

Microsoft’s Fara-7B is a computer-use AI agent that rivals GPT-4o and works directly on your PC

PositiveArtificial Intelligence

Microsoft has launched Fara-7B, a new 7-billion parameter AI model designed to function as a Computer Use Agent (CUA) that operates directly on users' PCs. This model aims to perform complex tasks locally, enhancing privacy and reducing reliance on cloud-based systems.

Read full article

via VentureBeat — AI