A Critical Study of Automatic Evaluation in Sign Language Translation
NeutralArtificial Intelligence
A recent study published on arXiv investigates the effectiveness of automatic evaluation metrics in sign language translation (SLT). Current metrics like BLEU and ROUGE are text-based, raising questions about their reliability in assessing SLT outputs. The study analyzes six metrics, including BLEU, chrF, and ROUGE, alongside LLM-based evaluators such as G-Eval and GEMBA. It assesses these metrics under controlled conditions, revealing limitations in lexical overlap metrics and highlighting the advantages of LLM-based evaluators in capturing semantic equivalence.
— via World Pulse Now AI Editorial System