Parameter Reduction Improves Vision Transformers: A Comparative Study of Sharing and Width Reduction

arXiv — cs.LGTuesday, December 2, 2025 at 5:00:00 AM
  • A recent study on Vision Transformers (ViTs) highlights the effectiveness of two parameter-reduction strategies, GroupedMLP and ShallowMLP, which improve model accuracy and training stability while reducing the number of parameters by 32.7%. The GroupedMLP variant achieved 81.47% top-1 accuracy, while ShallowMLP reached 81.25% accuracy with increased inference throughput. Both models surpassed the baseline accuracy of 81.05% for ViT-B/16 trained on ImageNet-1K.
  • These advancements are significant as they demonstrate that reducing model complexity can lead to enhanced performance and stability in Vision Transformers, which are widely used in computer vision tasks. The findings suggest that optimizing parameter usage can yield better results without the need for larger models, potentially influencing future research and applications in AI.
  • The exploration of parameter reduction in ViTs aligns with ongoing efforts in the AI community to enhance model efficiency and performance. Techniques such as Decorrelated Backpropagation and structural reparameterization are also being investigated to improve training speed and reduce computational costs. This trend reflects a broader shift towards developing more efficient AI models that maintain high accuracy while minimizing resource consumption.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection
PositiveArtificial Intelligence
A recent dissertation has addressed the challenges of zero-shot anomaly classification and segmentation, which are essential for detecting anomalies without prior training data. The study formalizes the issue of consistent anomalies, which can bias distance-based detection methods, and introduces CoDeGraph, a framework designed to filter these anomalies effectively.
LightHCG: a Lightweight yet powerful HSIC Disentanglement based Causal Glaucoma Detection Model framework
PositiveArtificial Intelligence
A new framework named LightHCG has been introduced for glaucoma detection, leveraging HSIC disentanglement and advanced AI models like Vision Transformers and VGG16. This model aims to enhance the accuracy of glaucoma diagnosis by analyzing retinal images, addressing the limitations of traditional diagnostic methods that rely heavily on subjective assessments and manual measurements.
PRISM: Diversifying Dataset Distillation by Decoupling Architectural Priors
PositiveArtificial Intelligence
The introduction of PRISM (PRIors from diverse Source Models) marks a significant advancement in dataset distillation, addressing the limitations of existing methods that often rely on a single teacher model. By decoupling architectural priors during the synthesis process, PRISM enhances the generation of synthetic data, leading to improved intra-class diversity and generalization, particularly on the ImageNet-1K dataset.
Comparative Analysis of Vision Transformer, Convolutional, and Hybrid Architectures for Mental Health Classification Using Actigraphy-Derived Images
PositiveArtificial Intelligence
A comparative analysis was conducted on three image-based methods—VGG16, ViT-B/16, and CoAtNet-Tiny—to classify mental health conditions such as depression and schizophrenia using actigraphy-derived images. The study utilized wrist-worn activity signals from the Psykose and Depresjon datasets, converting them into images for evaluation. CoAtNet-Tiny emerged as the most reliable method, achieving the highest average accuracy and stability across different data folds.
Hierarchical Semantic Alignment for Image Clustering
PositiveArtificial Intelligence
A new method for image clustering, named Hierarchical Semantic Alignment (CAE), has been proposed to enhance the categorization of images by addressing the ambiguity of nouns in semantic representations. This approach integrates caption-level descriptions and noun-level concepts to construct a semantic space that aligns with image features, improving clustering performance without the need for training.