arXiv:2509.01421v3 Announce Type: replace 
Abstract: Diffusion models (DMs) have become dominant in visual generation but suffer performance drop when tested on resolutions that differ from the training scale, whether lower or higher. In fact, the key challenge in generating variable-scale images lies in the differing amounts of information across resolutions, which requires information conversion procedures to be varied for generating variable-scaled images. In this paper, we investigate the issues of three critical aspects in DMs for a unified analysis in variable-scaled generation: dilated convolution, attention mechanisms, and initial noise. Specifically, 1) dilated convolution in DMs for the higher-resolution generation loses high-frequency information. 2) Attention for variable-scaled image generation struggles to adjust the information aggregation adaptively. 3) The spatial distribution of information in the initial noise is misaligned with variable-scaled image. To solve the above problems, we propose \textbf{InfoScale}, an information-centric framework for variable-scaled image generation by effectively utilizing information from three aspects correspondingly. For information loss in 1), we introduce Progressive Frequency Compensation module to compensate for high-frequency information lost by dilated convolution in higher-resolution generation. For information aggregation inflexibility in 2), we introduce Adaptive Information Aggregation module to adaptively aggregate information in lower-resolution generation and achieve an effective balance between local and global information in higher-resolution generation. For information distribution misalignment in 3), we design Noise Adaptation module to re-distribute information in initial noise for variable-scaled generation. Our method is plug-and-play for DMs and extensive experiments demonstrate the effectiveness in variable-scaled image generation.

دراسة حديثة تقدم InfoScale، وهو نهج جديد يهدف إلى تحسين توليد الصور ذات المقاييس المتغيرة باستخدام نماذج الانتشار. تسلط الأبحاث الضوء على التحديات مثل فقدان المعلومات عالية التردد في الالتفاف المتسع، والصعوبات في تجميع المعلومات بشكل تكيفي، وعدم توافق الضوضاء الأولية مع الصور ذات المقاييس المتغيرة. تعيق هذه المشكلات الأداء عند توليد الصور بدقة مختلفة عن مقياس التدريب.

Un estudio reciente presenta InfoScale, un enfoque novedoso destinado a mejorar la generación de imágenes a escala variable utilizando modelos de difusión. La investigación destaca desafíos como la pérdida de información de alta frecuencia en la convolución dilatada, las dificultades en la agregación adaptativa de información y el desajuste del ruido inicial con las imágenes a escala variable. Estos problemas obstaculizan el rendimiento al generar imágenes en resoluciones diferentes a la escala de entrenamiento.

Une étude récente présente InfoScale, une nouvelle approche visant à améliorer la génération d'images à échelle variable à l'aide de modèles de diffusion. La recherche met en lumière des défis tels que la perte d'informations haute fréquence dans la convolution dilatée, les difficultés d'agrégation d'informations adaptatives et le désalignement du bruit initial avec les images à échelle variable. Ces problèmes entravent les performances lors de la génération d'images à des résolutions différentes de l'échelle d'entraînement.

A recent study introduces InfoScale, a novel approach aimed at improving variable-scaled image generation using diffusion models. The research highlights challenges such as the loss of high-frequency information in dilated convolution, difficulties in adaptive information aggregation, and misalignment of initial noise with variable-scaled images. These issues hinder performance when generating images at resolutions different from the training scale.

InfoScale: Unleashing Training-free Variable-scaled Image Generation via Effective Utilization of Information

arXiv:2511.18281v1 Announce Type: new 
Abstract: Diffusion models (DMs) produce high-quality images, yet their sampling remains costly when adapted to new domains. Distilled DMs are faster but typically remain confined within their teacher's domain. Thus, fast and high-quality generation for novel domains relies on two-stage training pipelines: Adapt-then-Distill or Distill-then-Adapt. However, both add design complexity and suffer from degraded quality or diversity. We introduce Uni-DAD, a single-stage pipeline that unifies distillation and adaptation of DMs. It couples two signals during training: (i) a dual-domain distribution-matching distillation objective that guides the student toward the distributions of the source teacher and a target teacher, and (ii) a multi-head generative adversarial network (GAN) loss that encourages target realism across multiple feature scales. The source domain distillation preserves diverse source knowledge, while the multi-head GAN stabilizes training and reduces overfitting, especially in few-shot regimes. The inclusion of a target teacher facilitates adaptation to more structurally distant domains. We perform evaluations on a variety of datasets for few-shot image generation (FSIG) and subject-driven personalization (SDP). Uni-DAD delivers higher quality than state-of-the-art (SoTA) adaptation methods even with less than 4 sampling steps, and outperforms two-stage training pipelines in both quality and diversity.

تقدم دراسة جديدة Uni-DAD، وهو نهج موحد لتقطير وتكييف نماذج الانتشار، بهدف تحسين توليد الصور في خطوات قليلة ومع عدد قليل من الأمثلة. تجمع هذه الطريقة بين هدف تقطير مطابقة التوزيع ثنائي المجال وخسارة GAN متعددة الرؤوس في خط أنابيب من مرحلة واحدة، مما يعالج قيود عمليات التدريب التقليدية ذات المرحلتين التي غالبًا ما تضر بجودة وتنوع الصور.

Un nuevo estudio presenta Uni-DAD, un enfoque unificado para la destilación y adaptación de modelos de difusión, con el objetivo de mejorar la generación de imágenes en pocos pasos y con pocos ejemplos. Este método combina un objetivo de destilación de coincidencia de distribución de doble dominio y una pérdida de GAN de múltiples cabezales en un pipeline de una sola etapa, abordando las limitaciones de los procesos de entrenamiento tradicionales en dos etapas que a menudo comprometen la calidad y diversidad de las imágenes.

Une nouvelle étude présente Uni-DAD, une approche unifiée pour la distillation et l'adaptation des modèles de diffusion, visant à améliorer la génération d'images en quelques étapes et avec peu d'exemples. Cette méthode combine un objectif de distillation de correspondance de distribution à double domaine et une perte GAN multi-têtes dans un pipeline à une seule étape, répondant aux limites des processus d'entraînement traditionnels en deux étapes qui compromettent souvent la qualité et la diversité des images.

A new study introduces Uni-DAD, a unified approach for the distillation and adaptation of diffusion models aimed at enhancing few-step, few-shot image generation. This method combines dual-domain distribution-matching and a multi-head GAN loss in a single-stage pipeline, addressing the limitations of traditional two-stage training processes that often compromise image quality and diversity.

InfoScale: Unleashing Training-free Variable-scaled Image Generation via Effective Utilization of Information

Was this article worth reading? Share it

AiReelGenerator.com

Metaflow AI

Dynamiq