POTSA: A Cross-Lingual Speech Alignment Framework for Low Resource Speech-to-Text Translation
PositiveArtificial Intelligence
POTSA, a new framework for cross-lingual speech alignment, was introduced to tackle the biases in translation performance that arise from overlooking semantic commonalities across languages. By employing a Bias Compensation module and token-level Optimal Transport constraints, POTSA aligns speech representations effectively. Experiments conducted on the FLEURS dataset demonstrated its effectiveness, achieving a remarkable average improvement of 0.93 BLEU across five common languages and an impressive 5.05 BLEU for zero-shot languages, all while using only 10 hours of parallel speech data per source language. This advancement is particularly significant as it bridges the gap between high- and low-resource languages, making it a vital tool for enhancing multilingual communication and accessibility.
— via World Pulse Now AI Editorial System
