Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches

arXiv — cs.LG•Friday, December 12, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new study has introduced a novel evaluation metric for Automatic Speech Recognition (ASR) systems, focusing on intelligibility rather than traditional metrics like Word Error Rate (WER) and Character Error Rate (CER). The proposed metric integrates Natural Language Inference (NLI) scores, semantic similarity, and phonetic similarity, achieving a high correlation with human judgments, particularly for dysarthric and dysphonic speech.
This development is significant as it addresses the inadequacies of existing ASR evaluation methods, which often fail to reflect the intelligibility of speech, especially in clinical settings. By prioritizing intelligibility, the new metric aims to enhance the accessibility of ASR technologies for individuals with speech impairments.
The introduction of this metric aligns with ongoing discussions in the field regarding the limitations of traditional ASR evaluation methods and the potential of Large Language Models (LLMs) to improve ASR outputs. As ASR systems evolve, there is a growing emphasis on integrating semantic understanding and context-aware technologies, which may lead to more effective communication tools for diverse user needs.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

ShareSpeak

AI teleprompter for seamless presentations

AI & DataView app details

Humanize AI

Transform AI-generated text into undetectable, human-like content effortlessly.

Business & ProductivityView app details

Accesstive

AI-powered accessibility solutions designed for a more inclusive digital marketplace.

Marketing & CommerceView app details

LLMrefs

Track your keyword rankings across AI search engines for better SEO performance.

Marketing & CommerceView app details

AI speaker

Convert text to natural-sounding speech instantly with our free online AI tool.

Creative & DesignView app details

Continue Readings

arXiv — cs.CV2 days ago

Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles

PositiveArtificial Intelligence

A new framework called ThinkDeeper has been proposed to enhance the interpretation of natural-language commands for autonomous vehicles, addressing challenges in visual grounding methods that struggle with ambiguous instructions. This framework incorporates a Spatial-Aware World Model (SA-WM) to anticipate future spatial states, improving localization accuracy.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

Detailed balance in large language model-driven agents

NeutralArtificial Intelligence

Large language model (LLM)-driven agents are gaining traction as a novel approach to tackle complex problems, with recent research proposing a method based on the least action principle to understand their generative dynamics. This study reveals a detailed balance in LLM-generated transitions, suggesting that LLMs may learn underlying potential functions rather than explicit rules.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

LLM-Auction: Generative Auction towards LLM-Native Advertising

PositiveArtificial Intelligence

The recent introduction of LLM-Auction marks a significant advancement in the monetization strategies for large language models (LLMs), proposing a generative auction mechanism that integrates advertisement placement within LLM-generated responses. This innovative approach addresses the challenges posed by traditional auction mechanisms that separate ad allocation from LLM generation, which can be impractical for real-world applications.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding

PositiveArtificial Intelligence

A new study introduces an LLM-driven composite neural architecture search (NAS) aimed at optimizing state encoders for reinforcement learning (RL) that utilize multiple information sources, such as sensor data and textual instructions. This approach addresses the limitations of existing NAS methods that often neglect valuable intermediate output information, thereby enhancing sample efficiency in multi-source RL scenarios.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

Metaphor-based Jailbreaking Attacks on Text-to-Image Models

NeutralArtificial Intelligence

Recent advancements in text-to-image (T2I) models have been challenged by the introduction of MJA, a metaphor-based jailbreaking attack method that effectively bypasses existing defense mechanisms. This method leverages metaphorical prompts to induce T2I models to generate sensitive content, highlighting significant vulnerabilities in current AI safety protocols.

Read full article

via arXiv — cs.CV

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about