arXiv:2505.21473v2 Announce Type: replace 
Abstract: This paper presents DetailFlow, a coarse-to-fine 1D autoregressive (AR) image generation method that models images through a novel next-detail prediction strategy. By learning a resolution-aware token sequence supervised with progressively degraded images, DetailFlow enables the generation process to start from the global structure and incrementally refine details. This coarse-to-fine 1D token sequence aligns well with the autoregressive inference mechanism, providing a more natural and efficient way for the AR model to generate complex visual content. Our compact 1D AR model achieves high-quality image synthesis with significantly fewer tokens than previous approaches, i.e. VAR/VQGAN. We further propose a parallel inference mechanism with self-correction that accelerates generation speed by approximately 8x while reducing accumulation sampling error inherent in teacher-forcing supervision. On the ImageNet 256x256 benchmark, our method achieves 2.96 gFID with 128 tokens, outperforming VAR (3.3 FID) and FlexVAR (3.05 FID), which both require 680 tokens in their AR models. Moreover, due to the significantly reduced token count and parallel inference mechanism, our method runs nearly 2x faster inference speed compared to VAR and FlexVAR. Extensive experimental results demonstrate DetailFlow's superior generation quality and efficiency compared to existing state-of-the-art methods.

تم تقديم DetailFlow، وهي طريقة جديدة لتوليد الصور بطريقة autoregressive، باستخدام استراتيجية توقع التفاصيل التالية لتحسين توليد الصور. تتيح هذه التقنية توليد صور عالية الجودة باستخدام عدد أقل بكثير من الرموز مقارنة بالطرق الحالية، حيث تحقق gFID قدره 2.96 على معيار ImageNet. كما تسرع الطريقة من سرعة التوليد بحوالي 8 مرات، مما يمثل تقدمًا كبيرًا في مجال توليد الصور بالذكاء الاصطناعي.

DetailFlow, un nuevo método de generación de imágenes autorregresivas, fue presentado, aprovechando una estrategia de predicción del siguiente detalle para mejorar la síntesis de imágenes. Esta técnica permite la generación de imágenes de alta calidad con significativamente menos tokens en comparación con métodos existentes, logrando un gFID de 2.96 en el benchmark de ImageNet. El método también acelera la velocidad de generación en aproximadamente 8x, marcando un avance significativo en el campo de la generación de imágenes por IA.

DetailFlow, une nouvelle méthode de génération d'images autoregressive, a été présentée, utilisant une stratégie de prédiction du prochain détail pour améliorer la synthèse d'images. Cette technique permet une génération d'images de haute qualité avec significativement moins de tokens par rapport aux méthodes existantes, atteignant un gFID de 2,96 sur le benchmark ImageNet. La méthode accélère également la vitesse de génération d'environ 8x, marquant une avancée significative dans le domaine de la génération d'images par IA.

DetailFlow, a new autoregressive image generation method, was introduced, leveraging a next-detail prediction strategy for improved image synthesis. This technique allows for high-quality image generation with significantly fewer tokens compared to existing methods, achieving a gFID of 2.96 on the ImageNet benchmark. The method also accelerates generation speed by approximately 8x, marking a significant advancement in the field of AI image generation.

DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

Was this article worth reading? Share it

Ready to build your own newsroom?