TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English
PositiveArtificial Intelligence
- The TEDxTN project has launched the first publicly accessible speech translation dataset for Tunisian Arabic to English, comprising 108 TEDx talks and 25 hours of speech. This initiative addresses the data scarcity challenge faced by Arabic dialects and includes diverse accents from 11 regions in Tunisia.
- The significance of the TEDxTN dataset lies in its potential to enhance research in natural language processing for Tunisian dialects, providing a valuable resource for developers and researchers in the field of AI and linguistics.
- While there are no directly related articles, the TEDxTN dataset exemplifies a growing trend in creating open
— via World Pulse Now AI Editorial System
