UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective

arXiv — cs.CVWednesday, November 19, 2025 at 5:00:00 AM
  • The research introduces a new perspective on dataset pruning, emphasizing generalization over traditional fitting methods. By scoring samples based on models that have not encountered them during training, this approach aims to enhance the selection process, leading to more compact and informative datasets.
  • This development is significant as it addresses the limitations of existing pruning techniques that often result in a dense distribution of sample scores, which can hinder effective model performance. Improved dataset pruning can lead to more efficient deep learning applications across various domains.
  • The broader implications of this research resonate with ongoing discussions in the AI community regarding model robustness and generalization. As various methods for enhancing model performance emerge, the focus on generalization in dataset pruning reflects a shift towards more adaptive and resilient AI systems, aligning with trends in self
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
NOVAK: Unified adaptive optimizer for deep neural networks
PositiveArtificial Intelligence
The recent introduction of NOVAK, a unified adaptive optimizer for deep neural networks, combines several advanced techniques including adaptive moment estimation and lookahead synchronization, aiming to enhance the performance and efficiency of neural network training.
The Role of Noisy Data in Improving CNN Robustness for Image Classification
PositiveArtificial Intelligence
A recent study highlights the importance of data quality in enhancing the robustness of convolutional neural networks (CNNs) for image classification, specifically through the introduction of controlled noise during training. Utilizing the CIFAR-10 dataset, the research demonstrates that incorporating just 10% noisy data can significantly reduce test loss and improve accuracy under corrupted conditions without adversely affecting performance on clean data.
Cross-modal Proxy Evolving for OOD Detection with Vision-Language Models
PositiveArtificial Intelligence
A new framework named CoEvo has been proposed for zero-shot out-of-distribution (OOD) detection in vision-language models, addressing the challenges posed by the absence of labeled negatives. CoEvo employs a bidirectional adaptation mechanism for both textual and visual proxies, dynamically refining them based on contextual information from test images. This innovation aims to enhance the reliability of OOD detection in open-world applications.
Closed-Loop LLM Discovery of Non-Standard Channel Priors in Vision Models
PositiveArtificial Intelligence
A recent study has introduced a closed-loop framework for Neural Architecture Search (NAS) utilizing Large Language Models (LLMs) to optimize channel configurations in vision models. This approach addresses the combinatorial challenges of layer specifications in deep neural networks by leveraging LLMs to generate and refine architectural designs based on performance data.
DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning
PositiveArtificial Intelligence
The introduction of the Diffusion-Guided Autoencoder (DGAE) marks a significant advancement in latent representation learning, enhancing the decoder's expressiveness and effectively addressing training instability associated with GANs. This model achieves state-of-the-art performance while utilizing a latent space that is twice as compact, thus improving efficiency in image and video generative tasks.
A Preliminary Agentic Framework for Matrix Deflation
PositiveArtificial Intelligence
A new framework for matrix deflation has been proposed, utilizing an agentic approach where a Large Language Model (LLM) generates rank-1 Singular Value Decomposition (SVD) updates, while a Vision Language Model (VLM) evaluates these updates, enhancing solver stability through in-context learning and strategic permutations. This method was tested on various matrices, demonstrating promising results in noise reduction and accuracy.
Supervised Spike Agreement Dependent Plasticity for Fast Local Learning in Spiking Neural Networks
PositiveArtificial Intelligence
A new supervised learning rule, Spike Agreement-Dependent Plasticity (SADP), has been introduced to enhance fast local learning in spiking neural networks (SNNs). This method replaces traditional pairwise spike-timing comparisons with population-level agreement metrics, allowing for efficient supervised learning without backpropagation or surrogate gradients. Extensive experiments on datasets like MNIST and CIFAR-10 demonstrate its effectiveness.
Deep Exploration of Epoch-wise Double Descent in Noisy Data: Signal Separation, Large Activation, and Benign Overfitting
NeutralArtificial Intelligence
A recent study has empirically investigated epoch-wise double descent in deep learning, particularly focusing on the effects of noisy data on model generalization. Using fully connected neural networks trained on the CIFAR-10 dataset with 30% label noise, the research revealed that models can achieve strong re-generalization even after overfitting to noisy data, indicating a state of benign overfitting.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about