POTSA: A Cross-Lingual Speech Alignment Framework for Low Resource Speech-to-Text Translation

arXiv — cs.CLThursday, November 13, 2025 at 5:00:00 AM
POTSA, a new framework for cross-lingual speech alignment, was introduced to tackle the biases in translation performance that arise from overlooking semantic commonalities across languages. By employing a Bias Compensation module and token-level Optimal Transport constraints, POTSA aligns speech representations effectively. Experiments conducted on the FLEURS dataset demonstrated its effectiveness, achieving a remarkable average improvement of 0.93 BLEU across five common languages and an impressive 5.05 BLEU for zero-shot languages, all while using only 10 hours of parallel speech data per source language. This advancement is particularly significant as it bridges the gap between high- and low-resource languages, making it a vital tool for enhancing multilingual communication and accessibility.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about