arXiv:2502.16671v3 Announce Type: replace-cross 
Abstract: As AI becomes more closely integrated with peoples' daily activities, socially intelligent AI that can understand and interact seamlessly with humans in daily lives is increasingly important. However, current works in AI social reasoning all rely on language-only or language-dominant approaches to benchmark and training models, resulting in systems that are improving in verbal communication but struggle with nonverbal social understanding. To address this limitation, we tap into a novel data source rich in nonverbal social interactions -- mime videos. Mimes refer to the art of expression through gesture and movement without spoken words, which presents unique challenges and opportunities in interpreting nonverbal social communication. We contribute a new dataset called MimeQA, obtained by sourcing ~8 hours of videos clips from YouTube and developing a comprehensive video question-answering benchmark comprising 806 carefully annotated and verified question-answer pairs, designed to probe nonverbal social reasoning capabilities. Using MimeQA, we evaluate state-of-the-art video large language models (VideoLLMs) and find that they achieve low accuracy, generally ranging from 20-30%, while humans score 86%. Our analysis reveals that VideoLLMs often fail to ground imagined objects and over-rely on the text prompt while ignoring subtle nonverbal interactions. We hope to inspire future work in AI models that embody true social intelligence capable of interpreting non-verbal human interactions.

تم تقديم مجموعة بيانات جديدة تُدعى MimeQA، تركز على الذكاء الاصطناعي الاجتماعي القادر على تفسير التفاعلات الاجتماعية غير اللفظية من خلال مقاطع الفيديو الخاصة بالميم. تتضمن مجموعة البيانات حوالي 8 ساعات من مقاطع الفيديو المأخوذة من يوتيوب، بهدف تحسين فهم الذكاء الاصطناعي للتواصل غير اللفظي بعيدًا عن الأساليب التي تهيمن عليها اللغة.

Se ha introducido un nuevo conjunto de datos llamado MimeQA, que se centra en la IA socialmente inteligente capaz de interpretar interacciones sociales no verbales a través de videos de mímica. Este conjunto de datos incluye aproximadamente 8 horas de clips de video obtenidos de YouTube, con el objetivo de mejorar la comprensión de la comunicación no verbal por parte de la IA más allá de los enfoques dominados por el lenguaje.

Un nouveau jeu de données nommé MimeQA a été introduit, se concentrant sur l'IA socialement intelligente capable d'interpréter les interactions sociales non verbales à travers des vidéos de mime. Ce jeu de données comprend environ 8 heures de clips vidéo provenant de YouTube, visant à améliorer la compréhension de la communication non verbale par l'IA au-delà des approches dominées par le langage.

A new dataset named MimeQA has been introduced, focusing on socially intelligent AI that can interpret nonverbal social interactions through mime videos. This dataset includes approximately 8 hours of video clips sourced from YouTube, aimed at enhancing AI's understanding of nonverbal communication beyond traditional language-dominant approaches.

MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models

<A HREF="https://www.theverge.com/news/836173/youtube-recap-videos-eoy-review-wrapped"><IMG VSPACE="4" HSPACE="4" BORDER="0" ALIGN="RIGHT" SRC="http://www.techmeme.com/251202/i22.jpg"></A>
<A HREF="http://www.techmeme.com/251202/p22#a251202p22" TITLE="Techmeme permalink"><IMG WIDTH=11 HEIGHT=12 SRC="http://www.techmeme.com/img/pml.png" STYLE="border:none;padding:0;margin:0;"></A> Jess Weatherbed / <A HREF="https://www.theverge.com/">The Verge</A>: 
<A HREF="https://www.theverge.com/news/836173/youtube-recap-videos-eoy-review-wrapped">YouTube launches Recap, which lets users review their most notable video habits over the past year, in the US, ahead of a global rollout next week</A>&nbsp; &mdash;&nbsp; &#65279;The new annual insights feature lets you review your top channels, interests, and other viewing habits.

أطلقت يوتيوب ميزة جديدة تُدعى Recap، تتيح للمستخدمين في الولايات المتحدة مراجعة عاداتهم في مشاهدة الفيديو الأكثر بروزًا على مدار العام الماضي. من المقرر أن يتم طرح هذه الميزة عالميًا الأسبوع المقبل، مما يوفر رؤى حول القنوات والاهتمامات التي يفضلها المستخدمون.

YouTube ha lanzado una nueva función llamada Recap, que permite a los usuarios en EE. UU. revisar sus hábitos de video más notables del año pasado. Esta función se implementará a nivel global la próxima semana, proporcionando información sobre los canales y los intereses de visualización de los usuarios.

YouTube a lancé une nouvelle fonctionnalité appelée Recap, permettant aux utilisateurs aux États-Unis de revoir leurs habitudes vidéo les plus notables de l'année écoulée. Cette fonctionnalité sera déployée à l'échelle mondiale la semaine prochaine, offrant des aperçus sur les chaînes et les intérêts de visionnage des utilisateurs.

YouTube has launched a new feature called Recap, allowing users in the US to review their most notable video habits from the past year. This feature is set to roll out globally next week, providing insights into users' top channels and viewing interests.

YouTube launches Recap, which lets users review their most notable video habits over the past year, in the US, ahead of a global rollout next week (Jess Weatherbed/The Verge)

قدمت يوتيوب ميزة ملخص سنوي تلخص مقاطع الفيديو التي شاهدها المستخدمون على مدار العام الماضي، مما يعزز تفاعل المستخدمين ويوفر تجربة مشاهدة مخصصة. تهدف هذه الميزة الجديدة إلى مساعدة المستخدمين على التفكير في عاداتهم في المشاهدة واكتشاف المحتوى الذي قد يكونوا قد فاتتهم.

YouTube ha introducido una función de resumen anual que resume los videos que los usuarios han visto durante el año pasado, mejorando el compromiso del usuario y proporcionando una experiencia de visualización personalizada. Esta nueva función tiene como objetivo ayudar a los usuarios a reflexionar sobre sus hábitos de visualización y descubrir contenido que podrían haber pasado por alto.

YouTube a introduit une fonctionnalité de récapitulatif annuel qui résume les vidéos regardées par les utilisateurs au cours de l'année écoulée, améliorant ainsi l'engagement des utilisateurs et offrant une expérience de visionnage personnalisée. Cette nouvelle fonctionnalité vise à aider les utilisateurs à réfléchir à leurs habitudes de visionnage et à découvrir du contenu qu'ils auraient pu manquer.

YouTube has introduced a yearly recap feature that summarizes the videos users have watched over the past year, enhancing user engagement and providing a personalized viewing experience. This new feature aims to help users reflect on their viewing habits and discover content they may have missed.

MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models

Was this article worth reading? Share it

AiReelGenerator.com

VidBoard AI

Pallie