DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection

arXiv — cs.CV•Tuesday, November 25, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

The introduction of DiffSeg30k marks a significant advancement in the detection of AI-generated content (AIGC) by providing a dataset of 30,000 diffusion-edited images with pixel-level annotations. This dataset allows for fine-grained detection of localized edits, addressing a gap in existing benchmarks that typically assess entire images without considering localized modifications.
This development is crucial for enhancing the accuracy of AIGC detection methods, as it enables researchers and developers to better identify and analyze the impact of diffusion-based editing techniques on image authenticity and integrity.
The emergence of such datasets reflects a growing recognition of the challenges posed by advanced AI editing technologies, paralleling ongoing efforts in the field of object detection and image forgery detection. As AI-generated content becomes more prevalent, the need for robust detection frameworks is increasingly urgent, prompting innovations in data curation and model training methodologies.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

4o Image Gen

Generate high-quality AI images with accurate text and precise object control.

Creative & DesignTry the app

Capte

AI-powered video editing that simplifies and enhances your creative workflow.

AI & DataTry the app

Blunge

Train your own private AI image models to protect and personalize your unique artistic style.

Creative & DesignTry the app

Continue Readings

arXiv — cs.CVa day ago

Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving

PositiveArtificial Intelligence

The introduction of Percept-WAM marks a significant advancement in autonomous driving technology, focusing on enhancing spatial perception through a unified vision-language model that integrates 2D and 3D scene understanding. This model addresses the limitations of existing systems, which often struggle with accuracy and stability in complex driving scenarios.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding

NeutralArtificial Intelligence

A novel vulnerability in vision-language models (VLMs) has been identified through the introduction of IAG, a method that enables multi-target backdoor attacks on VLM-based visual grounding systems. This technique utilizes dynamically generated, input-aware triggers that are text-guided, allowing for imperceptible manipulation of visual inputs while maintaining normal performance on benign samples.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

TRANSPORTER: Transferring Visual Semantics from VLM Manifolds

PositiveArtificial Intelligence

The paper introduces TRANSPORTER, a model-independent approach designed to enhance video generation by transferring visual semantics from Vision Language Models (VLMs). This method addresses the challenge of understanding how VLMs derive their predictions, particularly in complex scenes with various objects and actions. TRANSPORTER generates videos that reflect changes in captions across diverse attributes and contexts.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

PositiveArtificial Intelligence

A new framework for deepfake detection, named Forgery-aware Audio-Visual Adaptation with Variational Bayes (FoVB), has been introduced to address the growing security concerns surrounding audio-visual deepfakes. This method leverages audio-visual correlation learning to identify subtle inconsistencies that can indicate forgery, utilizing variational Bayesian estimation to enhance detection accuracy.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

MagicWand: A Universal Agent for Generation and Evaluation Aligned with User Preference

PositiveArtificial Intelligence

Recent advancements in artificial intelligence-generated content (AIGC) have led to the development of MagicWand, a universal agent designed to enhance content generation and evaluation based on user preferences. This innovation is supported by the creation of a large-scale dataset, UniPrefer-100K, which includes images, videos, and text that reflect user style preferences. Additionally, UniPreferBench has been introduced as a benchmark for assessing user preference alignment across diverse AIGC applications.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

Zero-Reference Joint Low-Light Enhancement and Deblurring via Visual Autoregressive Modeling with VLM-Derived Modulation

PositiveArtificial Intelligence

A new generative framework has been proposed for enhancing low-light images and reducing blur, utilizing visual autoregressive modeling guided by perceptual priors from vision-language models. This approach addresses significant challenges in restoring dark images, which often suffer from low visibility, contrast, noise, and blur.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

CLASH: A Benchmark for Cross-Modal Contradiction Detection

PositiveArtificial Intelligence

CLASH has been introduced as a new benchmark for cross-modal contradiction detection, addressing the prevalent issue of contradictory multimodal inputs in real-world scenarios. This benchmark utilizes COCO images paired with captions that contain controlled contradictions, aiming to enhance the reliability of AI systems by evaluating their ability to detect inconsistencies across different modalities.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding

PositiveArtificial Intelligence

The introduction of Synthetic Object Compositions (SOC) marks a significant advancement in the field of computer vision, providing a scalable and accurate data synthesis pipeline for tasks such as instance segmentation, visual grounding, and object detection. This innovative approach utilizes 3D geometric layout and camera configuration augmentations to create high-quality synthetic object segments, addressing the limitations of traditional annotated datasets.

Read full article

via arXiv — cs.CV