Decorrelation Speeds Up Vision Transformers

arXiv — cs.CV•Thursday, November 27, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

Recent advancements in the optimization of Vision Transformers (ViTs) have been achieved through the integration of Decorrelated Backpropagation (DBP) into Masked Autoencoder (MAE) pre-training, resulting in a 21.1% reduction in wall-clock time and a 21.4% decrease in carbon emissions during training on datasets like ImageNet-1K and ADE20K.
This development is significant as it enhances the efficiency of ViTs in low-label data scenarios, making them more practical for industrial applications where computational resources are limited.
The ongoing evolution of ViT architectures reflects a broader trend in AI towards improving model efficiency and sustainability, with various strategies being explored to optimize performance while minimizing resource consumption, highlighting the importance of innovation in machine learning frameworks.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Dyad

Build and deploy free, local AI applications with open-source tools.

AI & DataTry the app

Videotok

Generate viral videos automatically using advanced AI technology.

AI & DataTry the app

Augmeta

AI peers for collaborative problem-solving and enhanced team productivity.

AI & DataTry the app

Continue Readings

arXiv — cs.CV15 hours ago

One Patch is All You Need: Joint Surface Material Reconstruction and Classification from Minimal Visual Cues

PositiveArtificial Intelligence

A new model named SMARC has been introduced, enabling surface material reconstruction and classification from minimal visual cues, specifically using just a 10% contiguous patch of an image. This approach addresses the limitations of existing methods that require dense observations, making it particularly useful in constrained environments.

Read full article

via arXiv — cs.CV

arXiv — cs.CV15 hours ago

Privacy-Preserving Federated Vision Transformer Learning Leveraging Lightweight Homomorphic Encryption in Medical AI

PositiveArtificial Intelligence

A new framework for privacy-preserving federated learning has been introduced, combining Vision Transformers with lightweight homomorphic encryption to enhance histopathology classification across multiple healthcare institutions. This approach addresses the challenges posed by privacy regulations like HIPAA, which restrict direct patient data sharing, while still enabling collaborative machine learning.

Read full article

via arXiv — cs.CV

arXiv — cs.CV15 hours ago

Frequency-Aware Token Reduction for Efficient Vision Transformer

PositiveArtificial Intelligence

A new study introduces a frequency-aware token reduction strategy for Vision Transformers, addressing the computational complexity associated with token length. This method enhances efficiency by categorizing tokens into high-frequency and low-frequency groups, selectively preserving high-frequency tokens while aggregating low-frequency ones into a compact form.

Read full article

via arXiv — cs.CV

arXiv — cs.CV15 hours ago

Mechanisms of Non-Monotonic Scaling in Vision Transformers

NeutralArtificial Intelligence

A recent study on Vision Transformers (ViTs) reveals a non-monotonic scaling behavior, where deeper models like ViT-L may underperform compared to shallower variants such as ViT-S and ViT-B. This research identifies a three-phase pattern—Cliff-Plateau-Climb—indicating how representation quality evolves with depth, particularly noting the diminishing role of the [CLS] token in favor of patch tokens for better performance.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Rethinking Vision Transformer Depth via Structural Reparameterization

PositiveArtificial Intelligence

A new study proposes a branch-based structural reparameterization technique for Vision Transformers, aiming to reduce the number of stacked transformer layers while maintaining their representational capacity. This method operates during the training phase, allowing for the consolidation of parallel branches into streamlined models for efficient inference deployment.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

MambaEye: A Size-Agnostic Visual Encoder with Causal Sequential Processing

PositiveArtificial Intelligence

MambaEye has been introduced as a novel visual encoder that operates in a size-agnostic manner, utilizing a causal sequential processing approach. This model leverages the Mamba2 backbone and introduces relative move embedding to enhance adaptability to various image resolutions and scanning patterns, addressing a long-standing challenge in visual encoding.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

Latent Diffusion Inversion Requires Understanding the Latent Space

NeutralArtificial Intelligence

Recent research highlights the need for a deeper understanding of latent space in Latent Diffusion Models (LDMs), revealing that these models exhibit uneven memorization across latent codes and that different dimensions within a single latent code contribute variably to memorization. This study introduces a method to rank these dimensions based on their impact on the decoder pullback metric.

Read full article

via arXiv — cs.LG

arXiv — cs.CV3 days ago

TSRE: Channel-Aware Typical Set Refinement for Out-of-Distribution Detection

PositiveArtificial Intelligence

A new method called Channel-Aware Typical Set Refinement (TSRE) has been proposed for Out-of-Distribution (OOD) detection, addressing the limitations of existing activation-based methods that often neglect channel characteristics, leading to inaccurate typical set estimations. This method enhances the separation between in-distribution and OOD data, improving model reliability in open-world environments.

Read full article

via arXiv — cs.CV