Hybrid Convolutional Autoencoder-Hierarchical Clustering Algorithm To Reveal Image Spam Sources.

IRI(2023)

引用 0|浏览9
暂无评分
摘要
We propose a novel hybrid algorithm framework to address the problem of clustering images received in spam emails based on authorship. The multimodal nature of these images, containing foreground objects, text, or a combination of both, poses a significant challenge for grouping them effectively. To address this challenge, we train convolutional autoencoders (CAE) to extract visual features from the images, which are produced by the encoder of the trained CAEs. Furthermore, we utilize an optical character recognition (OCR) algorithm to extract text information from the images. The extracted text and visual features, in conjunction with layout features, are employed to construct matrices that measure the similarities between each pair of images in our experiment dataset. We subsequently apply a two-stage hierarchical clustering algorithm to cluster the images into groups. We compare the results produced by our proposed algorithm with the ground truth collected by a domain expert. Our experimental findings reveal that our relatively simple CAEs, with as few as thirty-seven visual features, can achieve homogeneity, completeness, and V-measures that are as high as those obtained from more complex convolutional neural networks (CNNs).
更多
查看译文
关键词
image spam, clustering, multimodal analysis, convolutional autoencoders (CAEs), hybrid algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要