Likelihood-guided Regularization in Attention Based Models

arXiv — stat.ML•Tuesday, November 18, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new framework for Vision Transformers (ViTs) has been proposed, focusing on likelihood
This development is significant as it addresses the challenges of overfitting in high
The introduction of this framework aligns with ongoing advancements in transformer architectures, emphasizing the need for efficient training methods. As AI continues to evolve, the integration of adaptive techniques like this one reflects a broader trend towards optimizing model performance while maintaining interpretability, a crucial factor in AI deployment.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.LG19 hours ago

WARP-LUTs - Walsh-Assisted Relaxation for Probabilistic Look Up Tables

PositiveArtificial Intelligence

WARP-LUTs, or Walsh-Assisted Relaxation for Probabilistic Look-Up Tables, is a novel gradient-based method introduced to enhance machine learning efficiency. This approach focuses on learning combinations of logic gates with fewer trainable parameters, addressing the high computational costs associated with training models like Differentiable Logic Gate Networks (DLGNs). WARP-LUTs aim to improve accuracy, resource usage, and latency, making them a significant advancement in the field of AI.

Read full article

via arXiv — cs.LG

arXiv — cs.CV19 hours ago

Attention Via Convolutional Nearest Neighbors

PositiveArtificial Intelligence

The article introduces Convolutional Nearest Neighbors (ConvNN), a framework that unifies Convolutional Neural Networks (CNNs) and Transformers by viewing convolution and self-attention as neighbor selection and aggregation methods. ConvNN allows for a systematic exploration of the spectrum between these two architectures, serving as a drop-in replacement for convolutional and attention layers. The framework's effectiveness is validated through classification tasks on CIFAR-10 and CIFAR-100 datasets.

Read full article

via arXiv — cs.CV

arXiv — cs.LG19 hours ago

MI-to-Mid Distilled Compression (M2M-DC): An Hybrid-Information-Guided-Block Pruning with Progressive Inner Slicing Approach to Model Compression

PositiveArtificial Intelligence

MI-to-Mid Distilled Compression (M2M-DC) is a novel compression framework that combines information-guided block pruning with progressive inner slicing and staged knowledge distillation. The method ranks residual blocks based on a mutual information signal, removing the least informative units. It alternates short knowledge distillation phases with channel slicing to maintain computational efficiency while preserving model accuracy. The approach has demonstrated promising results on CIFAR-100, achieving high accuracy with significantly reduced parameters.

Read full article

via arXiv — cs.LG

arXiv — cs.CV19 hours ago

Vision Transformers with Self-Distilled Registers

PositiveArtificial Intelligence

Vision Transformers (ViTs) have become the leading architecture for visual processing tasks, showcasing remarkable scalability with larger training datasets and model sizes. However, recent findings have revealed the presence of artifact tokens in ViTs that conflict with local semantics, negatively impacting performance in tasks requiring precise localization and structural coherence. This paper introduces register tokens to mitigate this issue, proposing Post Hoc Registers (PH-Reg) as an efficient self-distillation method to integrate these tokens into existing ViTs without the need for retra…

Read full article

via arXiv — cs.CV

arXiv — cs.LG19 hours ago

DeepDefense: Layer-Wise Gradient-Feature Alignment for Building Robust Neural Networks

PositiveArtificial Intelligence

Deep neural networks are susceptible to adversarial perturbations that can lead to incorrect predictions. The paper introduces DeepDefense, a defense framework utilizing Gradient-Feature Alignment (GFA) regularization across multiple layers to mitigate this vulnerability. By aligning input gradients with internal feature representations, DeepDefense creates a smoother loss landscape, reducing sensitivity to adversarial noise. The method shows significant robustness improvements against various attacks, particularly on the CIFAR-10 dataset.

Read full article

via arXiv — cs.LG

arXiv — cs.LG19 hours ago

ChemFixer: Correcting Invalid Molecules to Unlock Previously Unseen Chemical Space

PositiveArtificial Intelligence

ChemFixer is a new framework designed to correct invalid molecules generated by deep learning-based molecular generation models. These models have shown promise in exploring chemical spaces for potential drug candidates, but often produce chemically invalid outputs. ChemFixer utilizes a transformer architecture and is fine-tuned on a dataset of valid and invalid molecular pairs. Evaluations indicate that it enhances molecular validity while maintaining the chemical and biological properties of the original outputs, thus expanding the usable chemical space.

Read full article

via arXiv — cs.LG

arXiv — cs.LG19 hours ago

Observational Auditing of Label Privacy

PositiveArtificial Intelligence

The article discusses a new framework for differential privacy auditing in machine learning systems. Traditional methods require altering training datasets, which can be resource-intensive. The proposed observational auditing framework utilizes the randomness of data distributions to evaluate privacy without modifying the original dataset. This approach extends privacy auditing to protected attributes, including labels, addressing significant gaps in existing techniques. Experiments conducted on Criteo and CIFAR-10 datasets validate its effectiveness.

Read full article

via arXiv — cs.LG

arXiv — cs.CV19 hours ago

UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective

PositiveArtificial Intelligence

The paper titled 'UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective' addresses the computational challenges posed by large datasets in deep learning. It proposes a novel approach to dataset pruning that focuses on generalization rather than fitting, scoring samples based on models not exposed to them during training. This method aims to create a more effective selection process by reducing the concentration of sample scores, ultimately improving the performance of deep learning models.

Read full article

via arXiv — cs.CV