arXiv:2506.05047v4 Announce Type: replace 
Abstract: The distribution of data changes over time; models operating in dynamic environments need retraining. But knowing when to retrain, without access to labels, is an open challenge since some, but not all shifts degrade model performance. This paper formalizes and addresses the problem of post-deployment deterioration (PDD) monitoring. We propose D3M, a practical and efficient monitoring algorithm based on the disagreement of predictive models, achieving low false positive rates under non-deteriorating shifts and provides sample complexity bounds for high true positive rates under deteriorating shifts. Empirical results on both standard benchmark and a real-world large-scale internal medicine dataset demonstrate the effectiveness of the framework and highlight its viability as an alert mechanism for high-stakes machine learning pipelines.

تناقش هذه المقالة نهجًا جديدًا لمراقبة أداء النماذج في البيئات المتغيرة دون الحاجة إلى تسميات. يقدم D3M، وهو خوارزمية فعالة تساعد في تحديد متى تحتاج النماذج إلى إعادة التدريب من خلال تحليل الاختلاف بين النماذج التنبؤية، مما يعالج تحديًا كبيرًا في تعلم الآلة.

Este artículo discute un nuevo enfoque para monitorear el rendimiento de los modelos en entornos cambiantes sin necesidad de etiquetas. Presenta D3M, un algoritmo eficiente que ayuda a identificar cuándo los modelos necesitan ser reentrenados al analizar el desacuerdo entre los modelos predictivos, abordando un desafío significativo en el aprendizaje automático.

Cet article aborde une nouvelle approche pour surveiller la performance des modèles dans des environnements changeants sans avoir besoin d'étiquettes. Il présente D3M, un algorithme efficace qui aide à identifier quand les modèles doivent être réentraînés en analysant le désaccord entre les modèles prédictifs, répondant ainsi à un défi majeur en apprentissage automatique.

This article discusses a new approach to monitor model performance in changing environments without needing labels. It introduces D3M, an efficient algorithm that helps identify when models need retraining by analyzing the disagreement among predictive models, addressing a significant challenge in machine learning.

Reliably Detecting Model Failures in Deployment Without Labels

Was this article worth reading? Share it

Ready to build your own newsroom?