Can MLLMs Read the Room? A Multimodal Benchmark for Assessing Deception in Multi-Party Social Interactions

arXiv — cs.CVFriday, November 21, 2025 at 5:00:00 AM
  • A recent study highlights the limitations of Multimodal Large Language Models (MLLMs) in detecting deception during complex social interactions, introducing a new benchmark called MIDA to evaluate their performance.
  • This development underscores the challenges faced by advanced AI models, particularly in understanding nuanced human communication, which is essential for applications in social robotics and virtual assistants.
  • The findings reflect ongoing concerns about the reliability of MLLMs, as they often fail to integrate multimodal cues effectively, a challenge echoed in various studies addressing hallucinations and misinformation in AI
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study
PositiveArtificial Intelligence
A recent study evaluated the performance of various large language models (LLMs) in restoring diacritics in Romanian texts, highlighting the importance of automatic diacritic restoration for effective text processing in languages rich in diacritical marks. Models tested included OpenAI's GPT-3.5, GPT-4, and Google's Gemini 1.0 Pro, among others, with GPT-4o achieving notable accuracy in diacritic restoration.
R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios
PositiveArtificial Intelligence
The introduction of R-AVST marks a significant advancement in the field of multimodal large language models (MLLMs), focusing on fine-grained spatio-temporal reasoning in complex audio-visual scenarios. This dataset comprises over 5,000 untrimmed videos annotated with 27,000 objects across 100 types of events, enabling the development of three core tasks for evaluating model performance in audio-visual reasoning.
SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion
PositiveArtificial Intelligence
SpatialGeo has been introduced as a novel vision encoder that enhances the spatial reasoning capabilities of multimodal large language models (MLLMs) by integrating geometry and semantics features. This advancement addresses the limitations of existing MLLMs, particularly in interpreting spatial arrangements in three-dimensional space, which has been a significant challenge in the field.
Q-REAL: Towards Realism and Plausibility Evaluation for AI-Generated Content
PositiveArtificial Intelligence
A new dataset named Q-Real has been introduced to evaluate the realism and plausibility of AI-generated images, consisting of 3,088 images annotated for major entities and judgment questions. This initiative aims to enhance the quality assessment of generative models, moving beyond the limitations of existing datasets that provide only a single quality score.
Cross-cultural value alignment frameworks for responsible AI governance: Evidence from China-West comparative analysis
NeutralArtificial Intelligence
A recent study has introduced a Multi-Layered Auditing Platform for Responsible AI, aimed at evaluating cross-cultural value alignment in Large Language Models (LLMs) from China and the West. This research highlights the governance challenges posed by LLMs in high-stakes decision-making, revealing fundamental instabilities in value systems and demographic under-representation among leading models like Qwen and GPT-4o.
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
NeutralArtificial Intelligence
Recent advancements in Multimodal Large Language Models (MLLMs) have highlighted the need to enhance their reasoning capabilities, particularly through the Chain-of-Thought (CoT) paradigm. This approach aims to improve reasoning transparency and interpretability, addressing existing challenges such as opaque reasoning paths and limited generalization abilities. The systematic review of Multimodal Chain-of-Thought (MCoT) methods provides insights into their theoretical foundations and practical applications.
Revisiting Multimodal KV Cache Compression: A Frequency-Domain-Guided Outlier-KV-Aware Approach
PositiveArtificial Intelligence
A new approach to multimodal KV Cache compression has been proposed, focusing on the distribution of KV matrices' energy in the frequency domain. This method identifies and removes outlier KV pairs that deviate from the principal energy, which significantly impacts the performance of multimodal large language models (MLLMs). The study highlights the limitations of existing compression methods that rely solely on attention scores.
Microsoft’s Fara-7B is a computer-use AI agent that rivals GPT-4o and works directly on your PC
PositiveArtificial Intelligence
Microsoft has launched Fara-7B, a new 7-billion parameter AI model designed to function as a Computer Use Agent (CUA) that operates directly on users' PCs. This model aims to perform complex tasks locally, enhancing privacy and reducing reliance on cloud-based systems.