A novel cross-domain adaptation framework for unsupervised criminal jargon detection via pre-trained contextual embedding of darknet corpus

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 0|浏览10
暂无评分
摘要
As the regulation on the surface web becomes more stringent, criminals are gradually turning to the darknet markets for illicit operations. Moderating and studying the content on the marketplaces contribute to the combat of criminal forces in the darknet. Nevertheless, to evade the surveillance of law enforcement, jargons are widely used in criminal conversations as a disguise. These jargons misinterpret the meaning of seemingly innocuous words in cryptic ways, creating a huge challenge for criminal investigation. Current research on Chinese jargon detection focuses on keyword matching. However, this approach cannot keep up with the rapid update of new jargons from various domains. To the best of our knowledge, we are the first to conduct Chinese jargons detection research in the darknet markets. Specifically, we design an unsupervised cross-domain adaptation Chinese jargon detection framework (CJD-Framework) integrated with the pre-trained language model. Firstly, six underground markets in Chinese are crawled to build the first dataset of darknet corpus (DC-dataset). Next, a pre-training model based on Chinese word is proposed to extract contextual embeddings for darknet words. Finally, relying on semantic similarity analysis, a cross-corpus framework is constructed to effectively identify Chinese jargons in the darknet. Comprehensive experiments demonstrate the effectiveness and generalizability of the CJD-framework over the state-of-the-art models, with a detection accuracy of 91.5%. The darknet corpus dataset and innovative framework proposed in this research can provide sources and ideas for future analysis of underground crimes in the darknet markets.
更多
查看译文
关键词
Jargon detection,Underground economy,Darknet markets,Unsupervised learning,Language model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要