Introduction
Generative artificial intelligence represented by Sora, DALL-E, and ChatGPT is sweeping the globe, being widely applied in various fields such as knowledge Q&A and video generation, becoming an innovation engine for several key industries. As a crucial element of the new generation of national AI infrastructure, generative AI has become the focus of strategic resource and soft power competition among countries worldwide. Meanwhile, the maturation and extensive use of generative AI technology have introduced new security risks like the generation of fake information, leading to frequent security incidents. Lawbreakers utilize deepfake technology to create convincingly real harmful videos and images that spread widely on the internet, causing security issues such as telecom fraud, defamation of national leaders, forgery of pornographic videos, and deception of facial recognition systems, thereby affecting national security and social stability.
As the ancient saying goes, seeing is believing, yet compared to texts and audio, visual information is more intuitive and persuasive. However, seeing may not always mean believing. Images and videos fabricated using deep synthesis technology can be misleading. For instance, during the 2022 Russia-Ukraine conflict, a forged video of Ukrainian President Volodymyr Zelenskyy calling for his soldiers to surrender went viral online; In 2024, scammers used deepfake technology to impersonate a senior manager of a British company in a video conference, defrauding nearly HKD 200 million from a Hong Kong-based company; also in 2024, a fabricated pornographic video featuring American celebrity Taylor Swift sparked heated discussions involving millions online. These cases highlight the potential threats posed by deep synthesis technology to individuals, society, and national security.
To effectively address these threats, in 2022, the National Internet Information Office, Ministry of Industry and Information Technology, and the Ministry of Public Security jointly issued the Regulations on the Management of Deep Synthesis Services for Internet Information Services, proposing standardized requirements for technologies like face generation, replacement, editing, and manipulation. In 2023, seven departments including the National Internet Information Office promulgated the Interim Measures for the Administration of Generative AI Services, further clarifying the classification and grading supervision of generative AI services, requiring providers of deepfake services to label generated content such as images and videos. However, the implementation of deepfake regulation laws relies on accurate and universal deepfake detection technology.
With the continuous evolution of deepfake technology, detecting forgeries in images and videos becomes increasingly challenging. Differences in image and video data distributions, along with the complex variety of forgery algorithms, result in existing detection algorithms having low accuracy and inadequate generalization. Moreover, current forgery detection methods are weak in security protection when facing the ever-evolving generative AI technologies, complex and changing real-world detection scenarios, and diverse data attack methods. To tackle these challenges, the Large Model Data Security Team at the National Key Laboratory of Blockchain and Data Security at Zhejiang University focuses on critical technological breakthroughs in deepfake detection, primarily targeting the domain of visual deepfake detection, and has launched the Visual Deepfake Detection Platform DFscan to counteract the threats posed by scene visual forgery techniques.
I. Deepfake Video Detection Platform DFscan
DFscan is a visual forgery detection platform developed by the Large Model Data Security Team at the National Key Laboratory of Blockchain and Data Security. Starting from promoting the legalization and healthy development of generative AI, the platform focuses on risk management and control of deepfake content in images/videos. It can be applied to scenarios such as AI fraud video identification, facial recognition system protection, monitoring of forged videos of key figures, and compliance checks of media content. Additionally, the detection platform supports fine-grained detection features like forged area localization, visualization of forged characteristics, and tracing of forgery methods, capable of generating multi-dimensional visualized detection reports.
Introduction to DFscan Platform
DFscan is an open-world general-purpose detection platform for different scenarios, supported by a data foundation, driven by proprietary algorithms, and centered around detection accuracy. It has established sub-platforms for image/video forgery detection and detection algorithm evaluation.
The core function of the image/video forgery detection sub-platform is to determine whether the input image/video was genuinely captured or generated by an AI model. Compared with existing deepfake detection technologies, DFscan's assessment technology addresses issues like poor generalization and insufficient robustness of current algorithms, achieving improvements in detection accuracy across multiple different data domains. The core function of the detection algorithm evaluation sub-platform is to assess various detection capabilities of input video/image forgery detection algorithms. To establish unified, multidimensional evaluation standards, the evaluation platform constructs numerous datasets covering six mainstream types of video deepfakes. Furthermore, it builds a comparative algorithm library, configuring a large number of open-source detection algorithms as benchmarking algorithms, generating comprehensive performance evaluation reports through multidimensional metrics assessments.
Platform Architecture Diagram
As shown in the figure above, DFscan consists of functional sub-platforms and a large-scale data foundation. These two major sub-platforms promote each other technically, while the dataset foundation provides the basis and support for their construction. Together, they form a complete and efficient system, providing strong support for related research and applications. The following diagram illustrates how users can upload videos for detection via the web page or directly use API calls to utilize the platform’s detection functions. After pre-processing operations such as frame slicing and face region cropping, the video to be tested is input into multiple categories of expert models for forgery detection, where each category includes several sub-models with identical structures. During the decision-making phase, different weights are assigned to various sub-models within the same category of expert models, aggregating their outputs through weighted addition. Finally, consensus decision-making methods integrate confidence scores from multiple categories of expert models. The platform determines the optimal judgment threshold through the Equal Error Rate (EER), compares the final detection confidence with the threshold, and derives the authenticity determination result.
DFscan Detection Process and Scoring Criteria
Features and Advantages of DFscan
The DFscan platform emphasizes self-iteration and the latest research on deepfake technology. It boasts significant advantages in data and algorithm accumulation, algorithm evaluation, core detection capabilities, and multifunctional services.
Platform Functionality Showcase
[Extensive Accumulation of Data, Algorithms, Achievements, and Projects]
The team has constructed a visual deepfake detection platform based on a knowledge-plus-data-driven model. The core of the platform lies in the dual accumulation of high-quality data and superior algorithms. DFscan aggregates representative and diverse open-source data while deploying numerous forgery algorithms and building large-scale proprietary datasets to support efficient model training and enhance detection capabilities against various deepfake techniques. By identifying characteristic differences between forged and authentic content and uncovering consistent features of deepfakes generated through different methods, the platform can accurately identify potential forged images/videos.
1.Large-Scale Database and Scenario-Based Data Augmentation
The DFscan platform has collected 15 publicly available forged and real datasets and deployed batch generation scripts for over 20 common forgery algorithms, constructing a self-built image/video forgery database with tens of millions of entries. During database construction, we considered attribute differences such as ethnicity, gender, age, mixtures of genuine and fake segments, shooting angles, lighting, etc. Moreover, the platform designs scenario-based data augmentation systems tailored for real-world detection tasks, focusing on enhancing robustness and generalizability to improve the practical performance of the DFscan platform. Data augmentation includes techniques like rotation, noise addition, compression, filtering, and adjustments to brightness, saturation, and contrast, simulating real-world scenarios to boost detector robustness. Additionally, some images undergo special processing using reconstruction and adversarial attack algorithms, further enhancing the detector's ability to detect various attack methods and counteract malicious evasion attacks.
2.Diverse Algorithm Foundation
Besides four fundamental types of forgery algorithms—face swapping, expression driving, attribute editing, and full-face synthesis—the DFscan platform's forgery algorithm foundation also encompasses text-to-image/video and audio-driven image forgery algorithms. In terms of detection algorithms, the platform deploys more than 12 typical open-source forgery detection algorithms for assessment and comparison of detection capabilities. These include over ten conventional forgery detection algorithms at video-level, full-image-level, and face-level for authenticity judgment, along with two specialized forgery detection algorithms featuring image forgery region localization and video forgery segment identification. This rich collection of detection algorithms and accumulated evaluation results lays a solid foundation for the platform’s core detection capabilities.
This comprehensive approach ensures that DFscan remains at the forefront of deepfake detection technology, providing robust solutions to emerging challenges in the field.
Forgery Feature Decoupling Method Based on Maximizing Mutual Information
2. High-Precision Detection Model for Face-Swap Scenarios
The platform conducts research on high-risk scenarios in deepfakes, particularly focusing on face swapping. It deploys over ten open-source detection models and identifies that existing face-swap detection algorithms generally suffer from low detection accuracy across different forgery algorithms. To address this issue, the team has designed a dual-stream detection model based on localization and verification. This model utilizes three functional modules to collaboratively process multi-modal and multi-scale features, employing localized forged regions to guide the model in sensing potential face-swap areas, thereby enhancing the model's detection capability for face-swap data.
Moreover, to solve the problem of a lack of forged region annotations in the vast number of face-swap images within the database, the team proposes a forged region estimation strategy based on semi-supervised learning. This approach aims to estimate the likely forged regions without extensive manual labeling, thus improving the efficiency and effectiveness of the detection model. By integrating these advanced techniques, the platform significantly enhances its ability to accurately detect and respond to sophisticated face-swap forgeries, offering robust solutions to the challenges posed by evolving deepfake technologies.
Dual-Stream Detection Model Based on Localization and Verification
3. Forgery Detection for Key Figures
Existing forgery detection algorithms often suffer from suboptimal performance when applied across different datasets. In recent years, the spread of forged videos targeting key figures on social media has been increasing, highlighting the urgent need for high-accuracy and high-generalization forgery detection models specifically tailored for these individuals. Research has shown that although videos generated by current forgery algorithms are highly realistic and can deceive human eyes to a large extent, deep neural networks can still distinguish them by extracting features. This distinction is particularly evident in identity features. Experiments have demonstrated that identity feature extraction networks commonly used in facial recognition scenarios place genuine and fake face identities in distinct locations within feature space.
Based on this finding, the team has developed an identity-prior-based forgery detection method for key figures. By introducing real images of key figures as prior information, they have constructed a forgery detection framework specifically designed for scenarios involving forged videos of these individuals. This approach significantly enhances detection accuracy and cross-domain capabilities in these specific scenarios. Utilizing the true images as a reference allows the model to better identify discrepancies and forge characteristics, thereby improving its effectiveness in detecting sophisticated forgeries targeting important personalities. This advancement is crucial in safeguarding against the misuse of deepfake technology aimed at prominent figures, contributing to enhanced security and trustworthiness in digital communications.
Key Figure Detection Method Based on Identity Prior Information
4. Transfer of Detection Capability Across Forgery Algorithms
The platform is dedicated to conducting in-depth research on emerging deepfake technologies and continuously updating and iterating detection models. However, in practical applications, challenges such as imbalanced data for new types of forgery samples and catastrophic forgetting during model iteration are encountered. These issues make it difficult for existing detection models to effectively learn from a small number of new forgery samples. Directly training the model with limited new data not only fails to provide sufficient feature information due to the small dataset size but may also lead to catastrophic forgetting, resulting in the loss of existing detection capabilities.
To address these challenges, the team proposes adopting an incremental learning strategy based on domain adaptation. This strategy utilizes supervised contrastive learning to fully exploit the relationships between old and new data, enabling the model to effectively learn new tasks. Moreover, by combining multi-angle knowledge distillation strategies and hard sample replay techniques, the model's generalization ability and stability are further enhanced, effectively mitigating the issue of catastrophic forgetting.
Through the implementation of this strategy, we aim to achieve continuous tracking and effective response to deepfake technologies while constantly optimizing and improving detection models. This will enhance their application effectiveness in real-world scenarios, ensuring that they remain robust and reliable against evolving threats. The approach ensures that the platform can maintain high performance and adaptability, providing a strong defense against the ever-changing landscape of deepfake technologies.
Incremental Learning Framework for Forgery Detection Based on Domain Adaptation
[Multi-functional Detection Services]
DFscan is designed to meet the detection needs of real-world scenarios, leveraging the team's extensive accumulation in data, algorithms, and engineering to provide users with trustworthy, precise, feature-rich, transparent, and user-friendly detection services. The platform supports the detection of both single and batch image and video data. Users only need to upload the data to be tested with a single click and wait a few seconds for the platform to return a detailed detection report. This report not only provides the determination result but also meticulously explains the confidence scores given by each expert model. For video data, the report further offers fine-grained segment determination results and average confidence scores. In the future, the platform will gradually introduce more detailed functionalities such as forged region localization and forgery method tracing.
The DFscan platform enhances the accuracy and clarity of determination results through multi-expert models, multi-dimensional detection, and multiple visualization methods. This approach significantly improves the user experience and decision-making trustworthiness by providing comprehensive insights into the authenticity of the uploaded content. By offering these advanced features and easy-to-understand reports, DFscan ensures that users can make informed decisions based on reliable detection outcomes, effectively addressing concerns related to deepfake content.
DFscan Platform Video Batch Detection Effect Display
DFscan Video Detection Results Display (Note: The higher the confidence score on the left, the more genuine the video frame is)
[DFscan Detection Capabilities]
DFscan boasts industry-leading capabilities in detecting the authenticity of images and videos. The team constructed a complex test set by randomly selecting 20,000 images from multiple representative datasets for face swapping (including FF++, DFDC, Celeb-DF, etc.). This test set was used to compare the detection capabilities of DFscan against several well-known open-source models. As shown in the figure below, compared to existing open-source detection models, DFscan demonstrates superior detection performance on open-source datasets.
Performance Comparison Between DFscan and Well-Known Open-Source Detection Models
To further evaluate DFscan's detection capabilities across different types of forgeries, the team conducted forgery detection tests on images/videos synthesized by some of the most popular generative models in the current visual AIGC field. These generative models include StyleGAN based on Generative Adversarial Networks (GANs), Stable Diffusion and Midjourney based on diffusion models, as well as OpenAI's latest text-to-video model Sora. The evaluation results are shown in the figure below, indicating that DFscan achieves high detection performance against various image/video generation models, with an average accuracy exceeding 90%.By showcasing these capabilities, DFscan highlights its effectiveness in combating deepfakes and ensuring the integrity of digital content. The platform not only provides reliable detection services but also continuously updates its algorithms to adapt to emerging threats, thereby maintaining its leading position in the field of forgery detection. Through rigorous testing and evaluation, DFscan proves its robustness and reliability, offering users a trustworthy solution for identifying and mitigating the risks associated with advanced visual forgeries.
DFscan Detection Capability Against Representative Generation Algorithms
II. Future Plans
Generative AI has demonstrated its innovative potential across various societal levels, bringing convenience to people's lives while simultaneously raising significant concerns regarding its security. In this context, the Large Model Data Security Team at the National Key Laboratory of Blockchain and Data Security at Zhejiang University is dedicated to researching the safety of deep synthesis content. The team has successfully constructed a data foundation containing tens of millions of pieces of forged content and has developed and reproduced over a dozen deep synthesis algorithms. Continuously promoting the scenario-specific application of visual deepfake detection platforms, the team provides strong support for research and applications in AI safety.
Given the continuous evolution of video deepfake technology, DFscan will continuously enhance its detection capabilities and presentation formats to meet users' growing needs and address future technological challenges. Firstly, the team will continually expand the data foundation and iteratively update detection technologies to achieve more precise location and recognition of forgery features, thereby improving detection accuracy for the latest video image generation models such as DALL·E3 and Sora. Secondly, efforts will be made to enhance detection robustness under adversarial conditions to counter various forms of video image forgery attacks. Additionally, improvements will be made to the presentation format of detection results, enhancing their interpretability so that users can more intuitively understand the results and the underlying logic, thus improving user experience on the platform. Finally, the platform will actively expand applicable scenarios to meet diverse user needs.
Looking ahead, DFscan will rely on the substantial strength of the National Key Laboratory of Blockchain and Data Security at Zhejiang University to actively engage in in-depth cooperation and exchanges with all sectors of society. We sincerely invite relevant professionals from all walks of life to contact us via email to obtain test accounts and directly experience the platform's video forgery detection functions. For users requiring automated deployment, we can provide API interfaces for remote invocation. Moreover, we welcome enterprises and institutions to propose specific business requirements, and we will offer customized synthetic video image detection services tailored to specific business scenarios. We also warmly welcome technical personnel and security researchers to provide valuable feedback on our platform and extend an invitation to universities, research institutions, and industry partners to collaborate deeply with us on scientific research and industrial-academic transformation, contributing collectively to the secure development of generative AI. Please reach out to us at DFscanzju@outlook.com for any collaboration intentions or suggestions.
—————————————————————————————————
The State Key Laboratory of Blockchain and Data Security at Zhejiang University was officially approved by the Ministry of Science and Technology in November 2022. Led by Academician Chen Chun as director, the laboratory focuses on international frontiers in blockchain and data security technology, aiming for high-level self-reliance in science and technology and striving to become a world-class strategic scientific force. Centered around integrating industry, education, and research, the laboratory conducts systematic and innovative scientific research. Its main research directions include blockchain technology and platforms, blockchain regulation and monitoring, smart contracts and distributed software, data element security and privacy computing, AI data security and cognitive confrontation, AI-native data processing systems, network data governance, intelligent connected vehicle data security, trusted data storage, and computing technology.
Led by Professor Ren Kui, Executive Deputy Director and Dean of the School of Computer Science at Zhejiang University, the Large Model Data Security Team within the laboratory conducts foundational research on data security and privacy under several national and provincial projects. Their work includes building large model data security evaluation platforms and security components, providing theoretical support, compliance testing, and security reinforcement services to ensure the entire process of data security during the training, deployment, and use of large models.