Clothing agnostic Pre-inpainting Virtual Try-ON

arXiv — cs.CVThursday, November 20, 2025 at 5:00:00 AM

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
SpecDiff: Accelerating Diffusion Model Inference with Self-Speculation
PositiveArtificial Intelligence
A new paradigm called SpecDiff has been introduced to accelerate diffusion model inference by utilizing self-speculation, which incorporates future information alongside historical data. This approach aims to enhance accuracy and speed in the inference process by employing a training-free multi-level feature caching strategy, including a feature selection algorithm based on self-speculative information.
SYNAPSE: Synergizing an Adapter and Finetuning for High-Fidelity EEG Synthesis from a CLIP-Aligned Encoder
PositiveArtificial Intelligence
SYNAPSE is a newly introduced framework that integrates an adapter and fine-tuning techniques to enhance high-fidelity EEG synthesis from a CLIP-aligned encoder. This two-stage approach aims to improve the representation of EEG signals, addressing challenges such as noise and inter-subject variability that have hindered previous image generation methods based on brain signals.
DICE: Distilling Classifier-Free Guidance into Text Embeddings
PositiveArtificial Intelligence
The paper presents DICE, a novel approach that distills Classifier-Free Guidance (CFG) into text embeddings, significantly reducing computational complexity while maintaining high-quality image generation in text-to-image diffusion models. This method addresses the common issue of misalignment between text prompts and generated images, which has been a challenge in the field.
Rethinking Garment Conditioning in Diffusion-based Virtual Try-On
PositiveArtificial Intelligence
A new study has introduced Re-CatVTON, an efficient single UNet model for Virtual Try-On (VTON) that enhances the garment conditioning process while reducing computational overhead. This model builds on the insights gained from analyzing context features in diffusion-based VTON, which previously relied on more complex Dual UNet architectures.
Deepfake Geography: Detecting AI-Generated Satellite Images
NeutralArtificial Intelligence
Recent advancements in AI, particularly with generative models like StyleGAN2 and Stable Diffusion, have raised concerns about the authenticity of satellite imagery, which is crucial for scientific and security analyses. A study has compared Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) for detecting AI-generated satellite images, revealing that ViTs outperform CNNs in accuracy and robustness.
CoD: A Diffusion Foundation Model for Image Compression
PositiveArtificial Intelligence
CoD, a new compression-oriented diffusion foundation model, has been introduced to enhance image compression efficiency, particularly at ultra-low bitrates. Unlike existing models that rely on text conditioning, CoD is designed for end-to-end optimization of both compression and generation, achieving state-of-the-art results when integrated with downstream codecs like DiffC.
Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder
PositiveArtificial Intelligence
A new framework called SAE Debias has been introduced to address gender bias in text-to-image (T2I) generation models, particularly those that generate stereotypical associations between professions and gendered subjects. This model-agnostic approach utilizes a k-sparse autoencoder to identify and suppress biased directions during image generation, aiming for more gender-balanced outputs without requiring model-specific adjustments.
Learning Latent Transmission and Glare Maps for Lens Veiling Glare Removal
PositiveArtificial Intelligence
A new generative model named VeilGen has been proposed to address the challenge of veiling glare in compact optical systems, which is often exacerbated by stray-light scattering from non-ideal surfaces. This model learns to simulate veiling glare by estimating optical transmission and glare maps from target images in an unsupervised manner, marking a significant advancement in lens performance enhancement.