CapST: An Enhanced and Lightweight Model Attribution Approach for Synthetic Videos
arXiv (Cornell University)(2023)
摘要
Deepfake videos, generated through AI faceswapping techniques, have garnered
considerable attention due to their potential for powerful impersonation
attacks. While existing research primarily focuses on binary classification to
discern between real and fake videos, however determining the specific
generation model for a fake video is crucial for forensic investigation.
Addressing this gap, this paper investigates the model attribution problem of
Deepfake videos from a recently proposed dataset, Deepfakes from Different
Models (DFDM), derived from various Autoencoder models. The dataset comprises
6,450 Deepfake videos generated by five distinct models with variations in
encoder, decoder, intermediate layer, input resolution, and compression ratio.
This study formulates Deepfakes model attribution as a multiclass
classification task, proposing a segment of VGG19 as a feature extraction
backbone, known for its effectiveness in imagerelated tasks, while integrated a
Capsule Network with a Spatio-Temporal attention mechanism. The Capsule module
captures intricate hierarchies among features for robust identification of
deepfake attributes. Additionally, the video-level fusion technique leverages
temporal attention mechanisms to handle concatenated feature vectors,
capitalizing on inherent temporal dependencies in deepfake videos. By
aggregating insights across frames, our model gains a comprehensive
understanding of video content, resulting in more precise predictions.
Experimental results on the deepfake benchmark dataset (DFDM) demonstrate the
efficacy of our proposed method, achieving up to a 4
categorizing deepfake videos compared to baseline models while demanding fewer
computational resources.
更多查看译文
关键词
deepfake video
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要