MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition

arXiv — cs.CVMonday, November 17, 2025 at 5:00:00 AM

Was this article worth reading? Share it

Recommended Readings
Improving Speech Emotion Recognition with Mutual Information Regularized Generative Model
PositiveArtificial Intelligence
Recent advancements in speech emotion recognition (SER) have been hindered by the lack of large quality-labelled training data. A new framework has been proposed that utilizes cross-modal information transfer and mutual information regularization to enhance data augmentation. This approach was tested on benchmark datasets including IEMOCAP, MSP-IMPROV, and MSP-Podcast, resulting in improved performance in emotion prediction compared to existing methods.