Step-E: A Differentiable Data Cleaning Framework for Robust Learning with Noisy Labels

arXiv — cs.LG•Monday, November 24, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new framework called Step-E has been introduced to enhance the training of deep neural networks by addressing the challenges posed by noisy labels and outliers in data. This framework integrates sample selection and model learning into a single optimization process, allowing for a more effective training approach that adapts to the noise patterns present in the data. In tests on the CIFAR-100N dataset, Step-E significantly improved the accuracy of a ResNet-18 model from 43.3% to 50.4%.
The development of Step-E is significant as it represents a shift from traditional two-stage data cleaning processes to a more integrated approach that leverages feedback from the model itself. By focusing on high-loss examples and gradually excluding them from training, Step-E not only enhances model performance but also provides a more robust learning environment that can adapt to various data quality issues.
This advancement highlights a growing recognition in the AI community of the importance of addressing data quality in machine learning. As models increasingly rely on large datasets collected from diverse sources, the integration of data cleaning with model training becomes crucial. Furthermore, similar approaches in different domains, such as medical image analysis, emphasize the need for innovative strategies to prevent shortcut learning and improve overall model reliability.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

FETCH HIVE

Build, test, and launch generative AI applications in minutes with ease.

AI & DataTry the app

AskTuring

Private AI that protects your data and never trains on it.

Business & ProductivityTry the app

Epsilla

Build AI agents with your own data on this all-in-one development platform.

Business & ProductivityTry the app

Continue Readings

arXiv — cs.CVa day ago

Preventing Shortcut Learning in Medical Image Analysis through Intermediate Layer Knowledge Distillation from Specialist Teachers

PositiveArtificial Intelligence

A new study introduces a knowledge distillation framework aimed at preventing shortcut learning in medical image analysis by utilizing intermediate layer insights from specialized teacher networks. This approach addresses the issue of deep learning models relying on irrelevant features, which can compromise patient safety in high-stakes medical applications.

Read full article

via arXiv — cs.CV

arXiv — stat.MLa day ago

Self-Supervised Learning by Curvature Alignment

PositiveArtificial Intelligence

A new self-supervised learning framework called CurvSSL has been introduced, which incorporates curvature regularization to enhance the learning process by considering the local geometry of data manifolds. This method builds on existing architectures like Barlow Twins and employs a two-view encoder-projector setup, aiming to improve representation learning in machine learning models.

Read full article

via arXiv — stat.ML

arXiv — cs.CVa day ago

Weakly Supervised Pneumonia Localization from Chest X-Rays Using Deep Neural Network and Grad-CAM Explanations

PositiveArtificial Intelligence

A recent study has introduced a weakly supervised deep learning framework for pneumonia classification and localization from chest X-ray images, utilizing Gradient-weighted Class Activation Mapping (Grad-CAM) to generate heatmaps that highlight affected regions without the need for costly pixel-level annotations. The framework achieved high accuracy rates of 96-98% across various pre-trained models, including ResNet-18 and EfficientNet-B0.

Read full article

via arXiv — cs.CV