A defensive attention mechanism to detect deepfake content across multiple modalities

S. Asha, P. Vinod, Varun G. Menon

Multimedia Systems(2024)

引用 0|浏览0
暂无评分
摘要
Recently, researchers have attracted much attention to the realistic nature of multi-modal deepfake content. They have employed plenty of handcrafted, learned features, and deep learning techniques to achieve promising performances for recognizing facial deepfakes. However, attackers continue to create deepfakes that outperform their earlier works by focusing on changes in many modalities, making deepfake identification under multiple modalities difficult. To exploit the merits of attention-based network architecture, we propose a novel cross-modal attention architecture on a bi-directional recurrent convolutional network to capture fake content in audio and video. For effective deepfake detection, the system records the spatial–temporal deformations of audio–video sequences and investigates the correlation in these modalities. We propose a self-attenuated VGG16 deep model for extracting visual features for facial fake recognition. Besides, the system incorporates a recurrent neural network with self-attention to extract false audio elements effectively. The cross-modal attention mechanism effectively learns the divergence between two modalities. Besides, we include multi-modal fake examples to create a well-balanced bespoke dataset to address the drawbacks of small and unbalanced training samples. We test the effectiveness of our proposed multi-modal deepfake detection strategy in comparison to state-of-the-art methods on a variety of existing datasets.
更多
查看译文
关键词
Multi-modal,Deepfakes,Attention mechanism,Multi-modal deepfake dataset,Multi-modal fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要