Progressive Supernet Training for Efficient Visual Autoregressive Modeling

arXiv — cs.CVFriday, November 21, 2025 at 5:00:00 AM
  • The introduction of VARiant marks a significant advancement in Visual Auto
  • This development is crucial as it addresses the limitations of existing VAR models, particularly in practical deployment scenarios where memory overhead can hinder performance.
  • The broader implications of this work resonate with ongoing efforts in the AI community to enhance model efficiency and reduce biases, as seen in related approaches that tackle issues in computer vision and generative modeling.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Continue Readings
Unsupervised Image Classification with Adaptive Nearest Neighbor Selection and Cluster Ensembles
PositiveArtificial Intelligence
The paper presents a novel approach to unsupervised image classification, focusing on clustering unlabeled images into meaningful categories. The method, named Image Clustering through Cluster Ensembles (ICCE), enhances clustering performance by integrating adaptive nearest neighbor selection and cluster ensembling strategies. This approach allows for the training of multiple clustering heads on a fixed backbone, resulting in diverse clusterings that are consolidated into a unified consensus clustering.
Spatial-and-Frequency-aware Restoration method for Images based on Diffusion Models
PositiveArtificial Intelligence
The paper presents SaFaRI, a spatial-and-frequency-aware diffusion model designed for image restoration (IR) that effectively handles Gaussian noise. This model enhances reconstruction quality by maintaining data fidelity in both spatial and frequency domains. Comprehensive evaluations demonstrate that SaFaRI outperforms existing zero-shot IR methods on ImageNet and FFHQ datasets, achieving state-of-the-art performance in various noisy inverse problems.
BioBench: A Blueprint to Move Beyond ImageNet for Scientific ML Benchmarks
PositiveArtificial Intelligence
BioBench is introduced as an open ecology vision benchmark that addresses the limitations of ImageNet in predicting performance on scientific imagery. It encompasses 9 application-driven tasks, 4 taxonomic kingdoms, and 6 acquisition modalities, totaling 3.1 million images. The benchmark aims to enhance ecological research by providing a unified platform for evaluating visual representation quality in ecological tasks.
ANTS: Adaptive Negative Textual Space Shaping for OOD Detection via Test-Time MLLM Understanding and Reasoning
PositiveArtificial Intelligence
The paper presents ANTS, an innovative method for enhancing Out-of-Distribution (OOD) detection by utilizing Adaptive Negative Textual Space. By leveraging multimodal large language models (MLLMs), the approach generates expressive negative sentences that accurately characterize OOD distributions. This method addresses the limitations of existing techniques, particularly in near-OOD detection, by caching images likely to be OOD samples and prompting MLLMs for detailed descriptions.
InvFusion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems
PositiveArtificial Intelligence
InvFusion is a novel approach that integrates supervised and zero-shot diffusion methods for solving inverse problems. It addresses the limitations of existing models by providing a degradation-aware posterior sampler that enhances accuracy while maintaining flexibility. This innovation is significant as it combines the strengths of both training-based and zero-shot techniques, marking a step forward in the application of diffusion models in various fields.
Learning to Expand Images for Efficient Visual Autoregressive Modeling
PositiveArtificial Intelligence
The paper introduces Expanding Autoregressive Representation (EAR), a new paradigm for visual generation that mimics the human visual system's center-outward perception. This method improves efficiency by unfolding image tokens in a spiral order, allowing for parallel decoding and preserving spatial continuity. Additionally, a length-adaptive decoding strategy is proposed to enhance flexibility and speed, ultimately reducing computational costs and improving generation quality.