Progressive Supernet Training for Efficient Visual Autoregressive Modeling

arXiv — cs.CV•Friday, November 21, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

The introduction of VARiant marks a significant advancement in Visual Auto
This development is crucial as it addresses the limitations of existing VAR models, particularly in practical deployment scenarios where memory overhead can hinder performance.
The broader implications of this work resonate with ongoing efforts in the AI community to enhance model efficiency and reduce biases, as seen in related approaches that tackle issues in computer vision and generative modeling.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Continue Readings

arXiv — cs.CV2 days ago

Unsupervised Image Classification with Adaptive Nearest Neighbor Selection and Cluster Ensembles

PositiveArtificial Intelligence

The paper presents a novel approach to unsupervised image classification, focusing on clustering unlabeled images into meaningful categories. The method, named Image Clustering through Cluster Ensembles (ICCE), enhances clustering performance by integrating adaptive nearest neighbor selection and cluster ensembling strategies. This approach allows for the training of multiple clustering heads on a fixed backbone, resulting in diverse clusterings that are consolidated into a unified consensus clustering.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

Spatial-and-Frequency-aware Restoration method for Images based on Diffusion Models

PositiveArtificial Intelligence

The paper presents SaFaRI, a spatial-and-frequency-aware diffusion model designed for image restoration (IR) that effectively handles Gaussian noise. This model enhances reconstruction quality by maintaining data fidelity in both spatial and frequency domains. Comprehensive evaluations demonstrate that SaFaRI outperforms existing zero-shot IR methods on ImageNet and FFHQ datasets, achieving state-of-the-art performance in various noisy inverse problems.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

BioBench: A Blueprint to Move Beyond ImageNet for Scientific ML Benchmarks

PositiveArtificial Intelligence

BioBench is introduced as an open ecology vision benchmark that addresses the limitations of ImageNet in predicting performance on scientific imagery. It encompasses 9 application-driven tasks, 4 taxonomic kingdoms, and 6 acquisition modalities, totaling 3.1 million images. The benchmark aims to enhance ecological research by providing a unified platform for evaluating visual representation quality in ecological tasks.

Read full article

via arXiv — cs.CV

arXiv — cs.CV3 days ago

ANTS: Adaptive Negative Textual Space Shaping for OOD Detection via Test-Time MLLM Understanding and Reasoning

PositiveArtificial Intelligence

The paper presents ANTS, an innovative method for enhancing Out-of-Distribution (OOD) detection by utilizing Adaptive Negative Textual Space. By leveraging multimodal large language models (MLLMs), the approach generates expressive negative sentences that accurately characterize OOD distributions. This method addresses the limitations of existing techniques, particularly in near-OOD detection, by caching images likely to be OOD samples and prompting MLLMs for detailed descriptions.

Read full article

via arXiv — cs.CV

arXiv — cs.CV3 days ago

InvFusion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems

PositiveArtificial Intelligence

InvFusion is a novel approach that integrates supervised and zero-shot diffusion methods for solving inverse problems. It addresses the limitations of existing models by providing a degradation-aware posterior sampler that enhances accuracy while maintaining flexibility. This innovation is significant as it combines the strengths of both training-based and zero-shot techniques, marking a step forward in the application of diffusion models in various fields.

Read full article

via arXiv — cs.CV

arXiv — cs.CV3 days ago

Learning to Expand Images for Efficient Visual Autoregressive Modeling

PositiveArtificial Intelligence

The paper introduces Expanding Autoregressive Representation (EAR), a new paradigm for visual generation that mimics the human visual system's center-outward perception. This method improves efficiency by unfolding image tokens in a spiral order, allowing for parallel decoding and preserving spatial continuity. Additionally, a length-adaptive decoding strategy is proposed to enhance flexibility and speed, ultimately reducing computational costs and improving generation quality.

Read full article

via arXiv — cs.CV