Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches

arXiv — cs.LGFriday, December 12, 2025 at 5:00:00 AM
  • A new study has introduced a novel evaluation metric for Automatic Speech Recognition (ASR) systems, focusing on intelligibility rather than traditional metrics like Word Error Rate (WER) and Character Error Rate (CER). The proposed metric integrates Natural Language Inference (NLI) scores, semantic similarity, and phonetic similarity, achieving a high correlation with human judgments, particularly for dysarthric and dysphonic speech.
  • This development is significant as it addresses the inadequacies of existing ASR evaluation methods, which often fail to reflect the intelligibility of speech, especially in clinical settings. By prioritizing intelligibility, the new metric aims to enhance the accessibility of ASR technologies for individuals with speech impairments.
  • The introduction of this metric aligns with ongoing discussions in the field regarding the limitations of traditional ASR evaluation methods and the potential of Large Language Models (LLMs) to improve ASR outputs. As ASR systems evolve, there is a growing emphasis on integrating semantic understanding and context-aware technologies, which may lead to more effective communication tools for diverse user needs.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
PositiveArtificial Intelligence
A new framework called ThinkDeeper has been proposed to enhance the interpretation of natural-language commands for autonomous vehicles, addressing challenges in visual grounding methods that struggle with ambiguous instructions. This framework incorporates a Spatial-Aware World Model (SA-WM) to anticipate future spatial states, improving localization accuracy.
Detailed balance in large language model-driven agents
NeutralArtificial Intelligence
Large language model (LLM)-driven agents are gaining traction as a novel approach to tackle complex problems, with recent research proposing a method based on the least action principle to understand their generative dynamics. This study reveals a detailed balance in LLM-generated transitions, suggesting that LLMs may learn underlying potential functions rather than explicit rules.
LLM-Auction: Generative Auction towards LLM-Native Advertising
PositiveArtificial Intelligence
The recent introduction of LLM-Auction marks a significant advancement in the monetization strategies for large language models (LLMs), proposing a generative auction mechanism that integrates advertisement placement within LLM-generated responses. This innovative approach addresses the challenges posed by traditional auction mechanisms that separate ad allocation from LLM generation, which can be impractical for real-world applications.
LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding
PositiveArtificial Intelligence
A new study introduces an LLM-driven composite neural architecture search (NAS) aimed at optimizing state encoders for reinforcement learning (RL) that utilize multiple information sources, such as sensor data and textual instructions. This approach addresses the limitations of existing NAS methods that often neglect valuable intermediate output information, thereby enhancing sample efficiency in multi-source RL scenarios.
Metaphor-based Jailbreaking Attacks on Text-to-Image Models
NeutralArtificial Intelligence
Recent advancements in text-to-image (T2I) models have been challenged by the introduction of MJA, a metaphor-based jailbreaking attack method that effectively bypasses existing defense mechanisms. This method leverages metaphorical prompts to induce T2I models to generate sensitive content, highlighting significant vulnerabilities in current AI safety protocols.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about