Multi-modal Deepfake Detection and Localization with FPN-Transformer

arXiv — cs.CVWednesday, November 12, 2025 at 5:00:00 AM
The introduction of the FPN-Transformer framework marks a significant advancement in the fight against deepfake technology, which poses serious risks to digital trust. Traditional unimodal detection methods have struggled to effectively identify and localize manipulated content due to their inability to utilize cross-modal correlations. The FPN-Transformer addresses these gaps by employing self-supervised models like WavLM for audio and CLIP for video, allowing for a more nuanced analysis of deepfake content. Experimental validation has confirmed the framework's effectiveness, achieving a notable score of 0.7535 in the IJCAI'25 DDL-AV benchmark. This development is crucial as it enhances the reliability of media verification processes, thereby fostering greater trust in digital communications.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about