Step-E: A Differentiable Data Cleaning Framework for Robust Learning with Noisy Labels

arXiv — cs.LGMonday, November 24, 2025 at 5:00:00 AM
  • A new framework called Step-E has been introduced to enhance the training of deep neural networks by addressing the challenges posed by noisy labels and outliers in data. This framework integrates sample selection and model learning into a single optimization process, allowing for a more effective training approach that adapts to the noise patterns present in the data. In tests on the CIFAR-100N dataset, Step-E significantly improved the accuracy of a ResNet-18 model from 43.3% to 50.4%.
  • The development of Step-E is significant as it represents a shift from traditional two-stage data cleaning processes to a more integrated approach that leverages feedback from the model itself. By focusing on high-loss examples and gradually excluding them from training, Step-E not only enhances model performance but also provides a more robust learning environment that can adapt to various data quality issues.
  • This advancement highlights a growing recognition in the AI community of the importance of addressing data quality in machine learning. As models increasingly rely on large datasets collected from diverse sources, the integration of data cleaning with model training becomes crucial. Furthermore, similar approaches in different domains, such as medical image analysis, emphasize the need for innovative strategies to prevent shortcut learning and improve overall model reliability.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Preventing Shortcut Learning in Medical Image Analysis through Intermediate Layer Knowledge Distillation from Specialist Teachers
PositiveArtificial Intelligence
A new study introduces a knowledge distillation framework aimed at preventing shortcut learning in medical image analysis by utilizing intermediate layer insights from specialized teacher networks. This approach addresses the issue of deep learning models relying on irrelevant features, which can compromise patient safety in high-stakes medical applications.
Self-Supervised Learning by Curvature Alignment
PositiveArtificial Intelligence
A new self-supervised learning framework called CurvSSL has been introduced, which incorporates curvature regularization to enhance the learning process by considering the local geometry of data manifolds. This method builds on existing architectures like Barlow Twins and employs a two-view encoder-projector setup, aiming to improve representation learning in machine learning models.
Weakly Supervised Pneumonia Localization from Chest X-Rays Using Deep Neural Network and Grad-CAM Explanations
PositiveArtificial Intelligence
A recent study has introduced a weakly supervised deep learning framework for pneumonia classification and localization from chest X-ray images, utilizing Gradient-weighted Class Activation Mapping (Grad-CAM) to generate heatmaps that highlight affected regions without the need for costly pixel-level annotations. The framework achieved high accuracy rates of 96-98% across various pre-trained models, including ResNet-18 and EfficientNet-B0.