arXiv:2511.10892v1 Announce Type: new 
Abstract: Multimodal emotion recognition plays a key role in many domains, including mental health monitoring, educational interaction, and human-computer interaction. However, existing methods often face three major challenges: unbalanced category distribution, the complexity of dynamic facial action unit time modeling, and the difficulty of feature fusion due to modal heterogeneity. With the explosive growth of multimodal data in social media scenarios, the need for building an efficient cross-modal fusion framework for emotion recognition is becoming increasingly urgent. To this end, this paper proposes Multimodal Cross-Attention Network and Contrastive Learning (MCN-CL) for multimodal emotion recognition. It uses a triple query mechanism and hard negative mining strategy to remove feature redundancy while preserving important emotional cues, effectively addressing the issues of modal heterogeneity and category imbalance. Experiment results on the IEMOCAP and MELD datasets show that our proposed method outperforms state-of-the-art approaches, with Weighted F1 scores improving by 3.42% and 5.73%, respectively.

تتناول الورقة المعنونة 'MCN-CL: الشبكة متعددة الوسائط مع الانتباه المتقاطع والتعلم التبايني للتعرف على المشاعر متعددة الوسائط' التحديات في التعرف على المشاعر متعددة الوسائط، وهو أمر حاسم للتطبيقات في الصحة النفسية والتعليم والتفاعل بين الإنسان والآلة. تقترح إطارًا جديدًا يستخدم آلية استعلام ثلاثية واستراتيجية تعدين سلبية لتحسين استخراج الميزات والتخفيف من المشكلات المتعلقة بتوزيع الفئات غير المتوازن وتباين الوسائط. تظهر النتائج التجريبية تحسينات كبيرة في الأداء على مجموعات بيانات IEMOCAP وMELD.

El artículo titulado 'MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition' aborda los desafíos en el reconocimiento de emociones multimodales, que es crucial para aplicaciones en salud mental, educación e interacción humano-computadora. Propone un nuevo marco que utiliza un mecanismo de consulta triple y minería negativa para mejorar la extracción de características y mitigar problemas relacionados con la distribución desequilibrada de categorías y la heterogeneidad modal. Los resultados experimentales demuestran mejoras significativas en el r…

L'article intitulé 'MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition' aborde les défis de la reconnaissance des émotions multimodales, essentielle pour des applications dans la santé mentale, l'éducation et l'interaction homme-machine. Il propose un nouveau cadre utilisant un mécanisme de requête triple et une stratégie de minage négatif pour améliorer l'extraction des caractéristiques et atténuer les problèmes liés à la distribution déséquilibrée des catégories et à l'hétérogénéité des modalités. Les résultats expérimentaux montrent des am…

The paper titled 'MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition' addresses the challenges in multimodal emotion recognition, which is crucial for applications in mental health, education, and human-computer interaction. It introduces a new framework that utilizes a triple query mechanism and hard negative mining to enhance feature extraction and mitigate issues related to unbalanced category distribution and modal heterogeneity. Experimental results demonstrate significant improvements in performance on the IEMOCAP and MELD datasets.

MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition

Was this article worth reading? Share it

Ready to build your own newsroom?