A framework for analysis and comparison of deepfakes detection methods
View fulltext online
Citation:Wang, C. (2021). A framework for analysis and comparison of deepfakes detection methods. (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Computing). Unitec Institute of Technology, New Zealand. Retrieved from https://hdl.handle.net/10652/5395
Permanent link to Research Bank record:https://hdl.handle.net/10652/5395
With the rise of AI (Artificial Intelligence), people can already utilise Deepfakes technology to generate fake pictures and videos increasingly. Similar to all technologies, while bringing benefits, this technology also has its downsides, such as spreading false information and endangering public interests. In order to combat the harm of fake-face videos, researchers have proposed a variety of different deep forgery detection algorithms, and have achieved remarkable results. However, a common problem regarding these detection methods are that in-library detection can normally achieve high accuracy, but the performance severely degraded in cross-library detection. That is to say, there is a serious problem of insufficient generalisation ability. The current mainstream detection method is to train a binary classification model on real videos and fake videos, and classify real videos and tampered videos through a classifier to distinguish true and false. In other words, Deepfakes detection is based on the evaluation criteria of the binary classification model to evaluate the performance of various detection methods. However, each evaluation criteria will have a different emphasis on different application scenarios and technical requirements. That is to say, using only some kind of single evaluation criteria cannot effectively compare the performance of different detection methods. In current literature, Deepfakes detection usually uses only the Area Under Curve (AUC) in the binary classification model as the evaluation standard. Nevertheless, AUC focuses on only the relative size of the probability value, and it does not consider the absolute size of the threshold and probability value. Moreover, when the data is very uneven, AUC may not properly assess the performance of the detection method. To better compare the performance differences between various detection methods, this thesis provides analysis and comparison on the six Deepfakes detection methods of Two-stream, MesoNet, HeadPose, FWA, VA and Multi-task. To assess generalisation ability of these methods, I conduct intra-library and cross-library tests on the existing three fake face video datasets. For this purpose, accuracy (ACC) and error rate are used as evaluation criteria in this thesis and I will focus on analysing the impacts of three factors (dataset partitioning, data augmentation operations, and the detection threshold selection) on the generalisation ability of the Deepfakes detection methods. Thus, the principal contributions of this thesis outline as follows: 1) an analytical crosslibrary platform for comparison of Deepfakes detection methods; 2) proposing new evaluation metrics based on data partitioning, augmentation, and threshold selection; 3) providing an overall performance indication of detection methods based on their generalisation performance on different data types.