UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation

arXiv — cs.CVWednesday, November 12, 2025 at 5:00:00 AM
UniCUE represents a breakthrough in the field of assistive technology, specifically targeting the communication needs of the hearing-impaired through Cued Speech Video-to-Speech generation (CSV2S). Traditional methods primarily focused on Cued Speech Recognition (CSR), which transcribes video content into text, creating potential for error propagation and misalignment in speech generation. UniCUE's innovative approach eliminates the need for intermediate text, directly generating intelligible speech from Cued Speech videos. This is particularly significant given the inherent complexities of multimodal data and the limited availability of Cued Speech datasets. The framework's integration of CSR tasks provides fine-grained visual-semantic cues that guide the speech generation process, enhancing the accuracy and effectiveness of communication for users. The development of a large-scale Mandarin Cued Speech dataset, UniCUE-HI, further supports this initiative, paving the way for more robus…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
CAT-Net: A Cross-Attention Tone Network for Cross-Subject EEG-EMG Fusion Tone Decoding
PositiveArtificial Intelligence
The study presents CAT-Net, a novel cross-subject multimodal brain-computer interface (BCI) decoding framework that integrates electroencephalography (EEG) and electromyography (EMG) signals to classify four Mandarin tones. This approach addresses the challenges of tonal variations in Mandarin, which can alter meanings despite identical phonemes. The framework demonstrates strong performance, achieving classification accuracies of 87.83% for audible speech and 88.08% for silent speech across 4800 EEG and 4800 EMG trials with 10 participants.