arXiv:2511.08613v1 Announce Type: new 
Abstract: Inpainting-based talking face generation aims to preserve video details such as pose, lighting, and gestures while modifying only lip motion, often using an identity reference image to maintain speaker consistency. However, this mechanism can introduce lip leaking, where generated lips are influenced by the reference image rather than solely by the driving audio. Such leakage is difficult to detect with standard metrics and conventional test setup. To address this, we propose a systematic evaluation methodology to analyze and quantify lip leakage. Our framework employs three complementary test setups: silent-input generation, mismatched audio-video pairing, and matched audio-video synthesis. We also introduce derived metrics including lip-sync discrepancy and silent-audio-based lip-sync scores. In addition, we study how different identity reference selections affect leakage, providing insights into reference design. The proposed methodology is model-agnostic and establishes a more reliable benchmark for future research in talking face generation.

تتناول دراسة جديدة نُشرت على arXiv مشكلة تسرب الهوية في توليد الوجوه المتحدثة، حيث تتأثر حركات الشفاه بصور مرجعية بدلاً من الصوت. تقدم هذه البحث منهجية تقييم منهجية لتحليل وقياس تسرب الشفاه من خلال إعدادات اختبار متنوعة ومقاييس مشتقة. تهدف النتائج إلى تعزيز موثوقية الأبحاث المستقبلية في هذا المجال، مما يعد أمرًا حيويًا للتقدم في الوسائط التي تنتجها الذكاء الاصطناعي.

Un nuevo estudio publicado en arXiv aborda el problema de la fuga de identidad en la generación de rostros hablantes, donde los movimientos de los labios se ven influenciados por imágenes de referencia en lugar de por el audio. Esta investigación introduce una metodología de evaluación sistemática para cuantificar la fuga de labios a través de diversos conjuntos de pruebas y métricas derivadas. Los hallazgos buscan mejorar la fiabilidad de futuras investigaciones en este campo, lo que es crucial para los avances en medios generados por IA.

Une nouvelle étude publiée sur arXiv aborde le problème de la fuite d'identité dans la génération de visages parlants, où les mouvements des lèvres sont influencés par des images de référence plutôt que par l'audio. Cette recherche introduit une méthodologie d'évaluation systématique pour quantifier la fuite des lèvres à travers divers dispositifs de test et des métriques dérivées. Les résultats visent à améliorer la fiabilité des recherches futures dans ce domaine, ce qui est crucial pour les avancées dans les médias générés par l'IA.

A new study published on arXiv addresses the issue of identity leakage in talking face generation, where lip movements are influenced by reference images instead of audio. This research introduces a systematic evaluation methodology to quantify lip leakage through various test setups and derived metrics. The findings aim to enhance the reliability of future research in this field, making it crucial for advancements in AI-generated media.

Assessing Identity Leakage in Talking Face Generation: Metrics and Evaluation Framework

Was this article worth reading? Share it

Ready to build your own newsroom?