arXiv:2511.20821v1 Announce Type: cross 
Abstract: Diffusion models have established the state-of-the-art in text-to-image generation, but their performance often relies on a diffusion prior network to translate text embeddings into the visual manifold for easier decoding. These priors are computationally expensive and require extensive training on massive datasets. In this work, we challenge the necessity of a trained prior at all by employing Optimization-based Visual Inversion (OVI), a training-free and data-free alternative, to replace the need for a prior. OVI initializes a latent visual representation from random pseudo-tokens and iteratively optimizes it to maximize the cosine similarity with input textual prompt embedding. We further propose two novel constraints, a Mahalanobis-based and a Nearest-Neighbor loss, to regularize the OVI optimization process toward the distribution of realistic images. Our experiments, conducted on Kandinsky 2.2, show that OVI can serve as an alternative to traditional priors. More importantly, our analysis reveals a critical flaw in current evaluation benchmarks like T2I-CompBench++, where simply using the text embedding as a prior achieves surprisingly high scores, despite lower perceptual quality. Our constrained OVI methods improve visual fidelity over this baseline, with the Nearest-Neighbor approach proving particularly effective, achieving quantitative scores comparable to or higher than the state-of-the-art data-efficient prior, indicating that the idea merits further investigation. The code will be publicly available upon acceptance.

تقدم دراسة حديثة مفهوم الانعكاس البصري القائم على التحسين (OVI) كطريقة خالية من التدريب لتوليد الصور من النصوص، متحدية الاعتماد على الشبكات السابقة للتشتت التي تتطلب موارد حسابية كبيرة. يقوم OVI بتحسين تمثيل بصري كامن ليتماشى مع المطالب النصية، مستخدمًا قيودًا جديدة لتعزيز واقعية الصور المولدة.

Un estudio reciente presenta la Inversión Visual Basada en Optimización (OVI) como un método sin entrenamiento para la generación de imágenes a partir de texto, desafiando la dependencia de redes de prior de difusión costosas en términos computacionales. OVI optimiza una representación visual latente para alinearse con las indicaciones textuales, utilizando restricciones novedosas para mejorar el realismo de las imágenes generadas.

Une étude récente présente l'Inversion Visuelle Basée sur l'Optimisation (OVI) comme une méthode sans entraînement pour la génération d'images à partir de texte, remettant en question la dépendance à des réseaux de prior diffusion coûteux en calcul. L'OVI optimise une représentation visuelle latente pour s'aligner sur les invites textuelles, utilisant des contraintes novatrices pour améliorer le réalisme des images générées.

A recent study introduces Optimization-based Visual Inversion (OVI) as a training-free method for text-to-image generation, challenging the reliance on computationally expensive diffusion prior networks. OVI optimizes a latent visual representation to align with textual prompts, utilizing novel constraints to enhance realism in generated images.

Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion

arXiv:2512.02019v1 Announce Type: cross 
Abstract: Diffusion models have achieved remarkable success in data-driven learning and in sampling from complex, unnormalized target distributions. Building on this progress, we reinterpret Maximum Entropy Reinforcement Learning (MaxEntRL) as a diffusion model-based sampling problem. We tackle this problem by minimizing the reverse Kullback-Leibler (KL) divergence between the diffusion policy and the optimal policy distribution using a tractable upper bound. By applying the policy gradient theorem to this objective, we derive a modified surrogate objective for MaxEntRL that incorporates diffusion dynamics in a principled way. This leads to simple diffusion-based variants of Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO) and Wasserstein Policy Optimization (WPO), termed DiffSAC, DiffPPO and DiffWPO. All of these methods require only minor implementation changes to their base algorithm. We find that on standard continuous control benchmarks, DiffSAC, DiffPPO and DiffWPO achieve better returns and higher sample efficiency than SAC and PPO.

تم تقديم إطار عمل جديد يعيد تفسير التعلم المعزز بأقصى انتروبيا (MaxEntRL) كمشكلة أخذ عينات قائمة على نماذج الانتشار، بهدف تقليل تباين كولباك-ليبلر العكسي بين سياسة الانتشار وتوزيع السياسة المثلى. تقود هذه الطريقة إلى تطوير متغيرات قائمة على الانتشار لخوارزميات موجودة مثل Soft Actor-Critic (SAC) وProximal Policy Optimization (PPO) وWasserstein Policy Optimization (WPO).

Se ha introducido un nuevo marco que reinterpreta el Aprendizaje por Refuerzo de Entropía Máxima (MaxEntRL) como un problema de muestreo basado en modelos de difusión, con el objetivo de minimizar la divergencia Kullback-Leibler inversa entre la política de difusión y la distribución de política óptima. Este enfoque conduce al desarrollo de variantes basadas en difusión de algoritmos existentes como Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO) y Wasserstein Policy Optimization (WPO).

Un nouveau cadre a été introduit, réinterprétant l'apprentissage par renforcement à entropie maximale (MaxEntRL) comme un problème d'échantillonnage basé sur des modèles de diffusion, visant à minimiser la divergence Kullback-Leibler inverse entre la politique de diffusion et la distribution de politique optimale. Cette approche conduit au développement de variantes basées sur la diffusion d'algorithmes existants tels que Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO) et Wasserstein Policy Optimization (WPO).

A new framework has been introduced that reinterprets Maximum Entropy Reinforcement Learning (MaxEntRL) as a diffusion model-based sampling problem, aiming to minimize the reverse Kullback-Leibler divergence between the diffusion policy and the optimal policy distribution. This approach leads to the development of diffusion-based variants of existing algorithms such as Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Wasserstein Policy Optimization (WPO).

Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion

Was this article worth reading? Share it

Tattoo Visualizer

LucidQuery AI

IMGFX.DEV