A Critical Study of Automatic Evaluation in Sign Language Translation

arXiv — cs.CLMonday, November 17, 2025 at 5:00:00 AM
A recent study published on arXiv investigates the effectiveness of automatic evaluation metrics in sign language translation (SLT). Current metrics like BLEU and ROUGE are text-based, raising questions about their reliability in assessing SLT outputs. The study analyzes six metrics, including BLEU, chrF, and ROUGE, alongside LLM-based evaluators such as G-Eval and GEMBA. It assesses these metrics under controlled conditions, revealing limitations in lexical overlap metrics and highlighting the advantages of LLM-based evaluators in capturing semantic equivalence.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about