Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
PositiveArtificial Intelligence
The field of Speech Emotion Recognition (SER) has seen significant progress, particularly through the integration of deep learning techniques and textual information. However, the role of physiological data during speech production has been largely overlooked. To address this gap, researchers conducted experiments focusing on phonation excitation information and articulatory kinematics, leading to the creation of the STEM-E2VA dataset, which includes both audio and physiological data such as EGG and EMA. These tools provide insights into speaker traits and emotional states. The study explored the feasibility of using estimated physiological data derived from speech, rather than directly collected EGG and EMA data. The experimental results confirmed the effectiveness of incorporating physiological information into SER, highlighting its potential for practical applications in real-world scenarios.
— via World Pulse Now AI Editorial System
