arXiv:2511.20107v1 Announce Type: new 
Abstract: Mispronunciation Detection and Diagnosis (MDD) is crucial for language learning and speech therapy. Unlike conventional methods that require scoring models or training phoneme-level models, we propose a novel training-free framework that leverages retrieval techniques with a pretrained Automatic Speech Recognition model. Our method avoids phoneme-specific modeling or additional task-specific training, while still achieving accurate detection and diagnosis of pronunciation errors. Experiments on the L2-ARCTIC dataset show that our method achieves a superior F1 score of 69.60% while avoiding the complexity of model training.

تم اقتراح إطار جديد لاكتشاف وتشخيص الأخطاء في النطق (MDD) يستخدم تقنيات الاسترجاع مع نموذج التعرف التلقائي على الكلام (ASR) المدرب مسبقًا، مما يلغي الحاجة إلى تدريب النماذج. أظهرت هذه الطريقة درجة F1 متفوقة تبلغ 69.60% على مجموعة بيانات L2-ARCTIC، مما يبرز فعاليتها في تحديد أخطاء النطق دون تعقيدات الأساليب التقليدية.

Se ha propuesto un nuevo marco para la Detección y Diagnóstico de Pronunciación Incorrecta (MDD), que utiliza técnicas de recuperación con un modelo de Reconocimiento Automático de Voz (ASR) preentrenado, eliminando la necesidad de entrenamiento de modelos. Este enfoque demostró un puntaje F1 superior del 69.60% en el conjunto de datos L2-ARCTIC, mostrando su efectividad para identificar errores de pronunciación sin las complejidades de los métodos tradicionales.

Un nouveau cadre pour la détection et le diagnostic des erreurs de prononciation (MDD) a été proposé, utilisant des techniques de récupération avec un modèle de reconnaissance automatique de la parole (ASR) préentraîné, éliminant ainsi le besoin d'entraînement de modèle. Cette approche a démontré un score F1 supérieur de 69,60 % sur le jeu de données L2-ARCTIC, montrant son efficacité à identifier les erreurs de prononciation sans les complexités des méthodes traditionnelles.

A novel framework for Mispronunciation Detection and Diagnosis (MDD) has been proposed, utilizing retrieval techniques with a pretrained Automatic Speech Recognition (ASR) model, eliminating the need for model training. This approach demonstrated a superior F1 score of 69.60% on the L2-ARCTIC dataset, showcasing its effectiveness in identifying pronunciation errors without the complexities of traditional methods.

Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach

Was this article worth reading? Share it

LucidQuery AI

ShareSpeak

FluentDictation

Airparser

Dubsmart LLC

AI speaker

Ready to build your own newsroom?