Clothing agnostic Pre-inpainting Virtual Try-ON

A new paradigm called SpecDiff has been introduced to accelerate diffusion model inference by utilizing self-speculation, which incorporates future information alongside historical data. This approach aims to enhance accuracy and speed in the inference process by employing a training-free multi-level feature caching strategy, including a feature selection algorithm based on self-speculative information.

Read full article

via arXiv — cs.LG

arXiv — cs.CVa day ago

SYNAPSE: Synergizing an Adapter and Finetuning for High-Fidelity EEG Synthesis from a CLIP-Aligned Encoder

PositiveArtificial Intelligence

SYNAPSE is a newly introduced framework that integrates an adapter and fine-tuning techniques to enhance high-fidelity EEG synthesis from a CLIP-aligned encoder. This two-stage approach aims to improve the representation of EEG signals, addressing challenges such as noise and inter-subject variability that have hindered previous image generation methods based on brain signals.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

DICE: Distilling Classifier-Free Guidance into Text Embeddings

PositiveArtificial Intelligence

The paper presents DICE, a novel approach that distills Classifier-Free Guidance (CFG) into text embeddings, significantly reducing computational complexity while maintaining high-quality image generation in text-to-image diffusion models. This method addresses the common issue of misalignment between text prompts and generated images, which has been a challenge in the field.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

Rethinking Garment Conditioning in Diffusion-based Virtual Try-On

PositiveArtificial Intelligence

A new study has introduced Re-CatVTON, an efficient single UNet model for Virtual Try-On (VTON) that enhances the garment conditioning process while reducing computational overhead. This model builds on the insights gained from analyzing context features in diffusion-based VTON, which previously relied on more complex Dual UNet architectures.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

Deepfake Geography: Detecting AI-Generated Satellite Images

NeutralArtificial Intelligence

Recent advancements in AI, particularly with generative models like StyleGAN2 and Stable Diffusion, have raised concerns about the authenticity of satellite imagery, which is crucial for scientific and security analyses. A study has compared Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) for detecting AI-generated satellite images, revealing that ViTs outperform CNNs in accuracy and robustness.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

CoD: A Diffusion Foundation Model for Image Compression

PositiveArtificial Intelligence

CoD, a new compression-oriented diffusion foundation model, has been introduced to enhance image compression efficiency, particularly at ultra-low bitrates. Unlike existing models that rely on text conditioning, CoD is designed for end-to-end optimization of both compression and generation, achieving state-of-the-art results when integrated with downstream codecs like DiffC.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder

PositiveArtificial Intelligence

A new framework called SAE Debias has been introduced to address gender bias in text-to-image (T2I) generation models, particularly those that generate stereotypical associations between professions and gendered subjects. This model-agnostic approach utilizes a k-sparse autoencoder to identify and suppress biased directions during image generation, aiming for more gender-balanced outputs without requiring model-specific adjustments.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

Learning Latent Transmission and Glare Maps for Lens Veiling Glare Removal

PositiveArtificial Intelligence

A new generative model named VeilGen has been proposed to address the challenge of veiling glare in compact optical systems, which is often exacerbated by stray-light scattering from non-ideal surfaces. This model learns to simulate veiling glare by estimating optical transmission and glare maps from target images in an unsupervised manner, marking a significant advancement in lens performance enhancement.

Read full article

via arXiv — cs.CV