CGFormer: ViT-Based Network for Identifying Computer-Generated Images With Token Labeling

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY(2024)

引用 0|浏览3
暂无评分
摘要
The advanced graphics rendering techniques and image generation algorithms significantly improve the visual quality of computer-generated (CG) images, and this makes it more challenging to distinguish between CG images and natural images (NIs) for a forensic detector. For the identification of CG images, human beings often need to inspect and evaluate the entire image and its local region as well. In addition, we observe that the distributions of both near and far patch-wise correlation have differences between CG images and NIs. Current mainstream methods adopt the CNN-based architecture with the classical cross entropy loss, however, there are several limitations: 1) the weakness of long-distance relationship modeling of image content due to the local receptive field of CNN; 2) the pixel sensitivity due to the convolutional computation; 3) the insufficient supervision due to the training loss on the whole image. In this paper, we propose a novel vision transformer (ViT)-based network with token labeling for CG image identification. Our network, called CGFormer, consists of patch embedding, feature modeling, and token prediction. We apply patch embedding to sequence the input image and weaken the pixel sensitivity. Stacked multi-head attention-based transformer blocks are utilized to model the patch-wise relationship and introduce a certain level of adaptability. Besides the conventional classification loss on class token of the whole image, we additionally introduce a soft cross entropy loss on patch tokens to comprehensively exploit the supervision information from local patches. Extensive experiments demonstrate that our method achieves the state-of-the-art forensic performance on six publicly available datasets in terms of classification accuracy, generalization, and robustness. Code is available at https://github.com/feipiefei/CGFormer.
更多
查看译文
关键词
CG image forensics,transformer,token labeling,generalization,robustness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要