arXiv:2511.07473v2 Announce Type: replace 
Abstract: Objective: Electronic health record (EHR) phenotyping often relies on noisy proxy labels, which undermine the reliability of downstream risk prediction. Active learning can reduce annotation costs, but most rely on fixed heuristics and do not ensure that phenotype refinement improves prediction performance. Our goal was to develop a framework that directly uses downstream prediction performance as feedback to guide phenotype correction and sample selection under constrained labeling budgets.
  Materials and Methods: We propose Reinforcement-Enhanced Label-Efficient Active Phenotyping (RELEAP), a reinforcement learning-based active learning framework. RELEAP adaptively integrates multiple querying strategies and, unlike prior methods, updates its policy based on feedback from downstream models. We evaluated RELEAP on a de-identified Duke University Health System (DUHS) cohort (2014-2024) for incident lung cancer risk prediction, using logistic regression and penalized Cox survival models. Performance was benchmarked against noisy-label baselines and single-strategy active learning.
  Results: RELEAP consistently outperformed all baselines. Logistic AUC increased from 0.774 to 0.805 and survival C-index from 0.718 to 0.752. Using downstream performance as feedback, RELEAP produced smoother and more stable gains than heuristic methods under the same labeling budget.
  Discussion: By linking phenotype refinement to prediction outcomes, RELEAP learns which samples most improve downstream discrimination and calibration, offering a more principled alternative to fixed active learning rules.
  Conclusion: RELEAP optimizes phenotype correction through downstream feedback, offering a scalable, label-efficient paradigm that reduces manual chart review and enhances the reliability of EHR-based risk prediction.

تم تطوير إطار جديد يسمى Reinforcement-Enhanced Label-Efficient Active Phenotyping (RELEAP) لتحسين موثوقية تصنيف السجلات الصحية الإلكترونية (EHR) من خلال استخدام التعلم المعزز لتوجيه تصحيح النمط واختيار العينات. تهدف هذه الطريقة إلى تحسين الأداء التنبؤي للتقييمات اللاحقة للمخاطر، خاصة في التنبؤ بمخاطر الإصابة بسرطان الرئة باستخدام بيانات من نظام صحة جامعة ديوك.

Se ha desarrollado un nuevo marco llamado Reinforcement-Enhanced Label-Efficient Active Phenotyping (RELEAP) para mejorar la fiabilidad del fenotipado de los registros de salud electrónicos (EHR) utilizando el aprendizaje por refuerzo para guiar la corrección de fenotipos y la selección de muestras. Este enfoque tiene como objetivo mejorar el rendimiento predictivo para las evaluaciones de riesgo posteriores, especialmente en la predicción del riesgo de cáncer de pulmón incidente utilizando datos del Duke University Health System.

Un nouveau cadre appelé Reinforcement-Enhanced Label-Efficient Active Phenotyping (RELEAP) a été développé pour améliorer la fiabilité du phénotypage des dossiers de santé électroniques (DSE) en utilisant l'apprentissage par renforcement pour guider la correction des phénotypes et la sélection des échantillons. Cette approche vise à améliorer la performance prédictive pour les évaluations de risque en aval, en particulier dans la prédiction du risque de cancer du poumon incident à partir des données du Duke University Health System.

A new framework called Reinforcement-Enhanced Label-Efficient Active Phenotyping (RELEAP) has been developed to improve the reliability of electronic health record (EHR) phenotyping by using reinforcement learning to guide phenotype correction and sample selection. This approach aims to enhance prediction performance for downstream risk assessments, particularly in predicting incident lung cancer risk using data from the Duke University Health System.

RELEAP: Reinforcement-Enhanced Label-Efficient Active Phenotyping for Electronic Health Records

Was this article worth reading? Share it

LucidQuery AI

Polidict

ClassX

Research AI

Resub

Prepin

Ready to build your own newsroom?