UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation
PositiveArtificial Intelligence
UniCUE represents a breakthrough in the field of assistive technology, specifically targeting the communication needs of the hearing-impaired through Cued Speech Video-to-Speech generation (CSV2S). Traditional methods primarily focused on Cued Speech Recognition (CSR), which transcribes video content into text, creating potential for error propagation and misalignment in speech generation. UniCUE's innovative approach eliminates the need for intermediate text, directly generating intelligible speech from Cued Speech videos. This is particularly significant given the inherent complexities of multimodal data and the limited availability of Cued Speech datasets. The framework's integration of CSR tasks provides fine-grained visual-semantic cues that guide the speech generation process, enhancing the accuracy and effectiveness of communication for users. The development of a large-scale Mandarin Cued Speech dataset, UniCUE-HI, further supports this initiative, paving the way for more robus…
— via World Pulse Now AI Editorial System
