POTSA: A Cross-Lingual Speech Alignment Framework for Low Resource Speech-to-Text Translation

arXiv — cs.CLThursday, November 13, 2025 at 5:00:00 AM
POTSA, a new framework for cross-lingual speech alignment, was introduced to tackle the biases in translation performance that arise from overlooking semantic commonalities across languages. By employing a Bias Compensation module and token-level Optimal Transport constraints, POTSA aligns speech representations effectively. Experiments conducted on the FLEURS dataset demonstrated its effectiveness, achieving a remarkable average improvement of 0.93 BLEU across five common languages and an impressive 5.05 BLEU for zero-shot languages, all while using only 10 hours of parallel speech data per source language. This advancement is particularly significant as it bridges the gap between high- and low-resource languages, making it a vital tool for enhancing multilingual communication and accessibility.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
OT-ALD: Aligning Latent Distributions with Optimal Transport for Accelerated Image-to-Image Translation
PositiveArtificial Intelligence
The paper titled 'OT-ALD: Aligning Latent Distributions with Optimal Transport for Accelerated Image-to-Image Translation' introduces a new framework for image-to-image translation called OT-ALD. This method addresses challenges faced by the Dual Diffusion Implicit Bridge (DDIB), particularly low translation efficiency and trajectory deviations due to mismatched latent distributions. By leveraging optimal transport theory, OT-ALD enhances the translation process, improving sampling efficiency by 20.29% and reducing the FID score by 2.6.