Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual Speech Recognition Evaluation

arXiv — cs.CL•Thursday, December 11, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

The Open ASR Leaderboard has been launched as a comprehensive benchmark for automatic speech recognition (ASR) systems, featuring over 60 systems evaluated across 11 datasets, including a multilingual track. This initiative aims to address the current evaluation bias towards short-form English and improve reporting on efficiency metrics like word error rate (WER) and real-time factor (RTFx).
This development is significant as it promotes reproducibility and transparency in ASR evaluations, allowing researchers and developers to make informed comparisons between various systems. By standardizing metrics, the leaderboard encourages advancements in multilingual capabilities and efficiency in speech recognition technologies.
The introduction of the Open ASR Leaderboard reflects a growing recognition of the need for diverse language representation in ASR systems. As the field evolves, challenges such as alignment inaccuracies and the need for fine-tuning models for specific languages, as seen with the Whisper model, highlight ongoing efforts to enhance performance across different linguistic contexts.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

One More Thing in AI

Master AI with curated tools and tutorials for practical, real-world applications.

ShareSpeak

AI teleprompter for seamless presentations

AI & DataView app details

OpenL Translator

Instantly translate text from images of signs and menus with accuracy.

AI & DataView app details

1Transcribe

Convert audio to text with high accuracy and minimal effort.

Business & ProductivityView app details

SoundWise.ai

Transcribe videos and audio with AI-powered accuracy and speed.

AI & DataView app details

LangWatch

Monitor and improve your AI applications for quality, safety, and reliability.

AI & DataView app details

Continue Readings

arXiv — cs.CV3 days ago

Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles

PositiveArtificial Intelligence

A new framework called ThinkDeeper has been proposed to enhance the interpretation of natural-language commands for autonomous vehicles, addressing challenges in visual grounding methods that struggle with ambiguous instructions. This framework incorporates a Spatial-Aware World Model (SA-WM) to anticipate future spatial states, improving localization accuracy.

Read full article

via arXiv — cs.CV

arXiv — cs.LG3 days ago

Detailed balance in large language model-driven agents

NeutralArtificial Intelligence

Large language model (LLM)-driven agents are gaining traction as a novel approach to tackle complex problems, with recent research proposing a method based on the least action principle to understand their generative dynamics. This study reveals a detailed balance in LLM-generated transitions, suggesting that LLMs may learn underlying potential functions rather than explicit rules.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

Semantic-Aware Confidence Calibration for Automated Audio Captioning

PositiveArtificial Intelligence

A new framework has been introduced for automated audio captioning that integrates confidence prediction and redefines correctness through semantic similarity. This approach addresses the issue of overconfident predictions in audio captioning models, which often lack semantic accuracy. By employing CLAP audio-text embeddings and a learned confidence prediction head, the model enhances the reliability of audio captioning outputs.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

LLM-Auction: Generative Auction towards LLM-Native Advertising

PositiveArtificial Intelligence

The recent introduction of LLM-Auction marks a significant advancement in the monetization strategies for large language models (LLMs), proposing a generative auction mechanism that integrates advertisement placement within LLM-generated responses. This innovative approach addresses the challenges posed by traditional auction mechanisms that separate ad allocation from LLM generation, which can be impractical for real-world applications.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches

PositiveArtificial Intelligence

A new study has introduced a novel evaluation metric for Automatic Speech Recognition (ASR) systems, focusing on intelligibility rather than traditional metrics like Word Error Rate (WER) and Character Error Rate (CER). The proposed metric integrates Natural Language Inference (NLI) scores, semantic similarity, and phonetic similarity, achieving a high correlation with human judgments, particularly for dysarthric and dysphonic speech.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding

PositiveArtificial Intelligence

A new study introduces an LLM-driven composite neural architecture search (NAS) aimed at optimizing state encoders for reinforcement learning (RL) that utilize multiple information sources, such as sensor data and textual instructions. This approach addresses the limitations of existing NAS methods that often neglect valuable intermediate output information, thereby enhancing sample efficiency in multi-source RL scenarios.

Read full article

via arXiv — cs.LG

arXiv — cs.CV3 days ago

Metaphor-based Jailbreaking Attacks on Text-to-Image Models

NeutralArtificial Intelligence

Recent advancements in text-to-image (T2I) models have been challenged by the introduction of MJA, a metaphor-based jailbreaking attack method that effectively bypasses existing defense mechanisms. This method leverages metaphorical prompts to induce T2I models to generate sensitive content, highlighting significant vulnerabilities in current AI safety protocols.

Read full article

via arXiv — cs.CV

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about