Comparisons of machine learning techniques for detecting fraudulent criminal identities

Hassan Kazemian, Subeksha Shrestha

Expert Syst. Appl.(2023)

引用 0|浏览0
暂无评分
摘要
This paper focuses on applications of various machine learning techniques on an anonymized policing dataset used in EU SPIRIT Horizon 2020 project to identify fraudulent identities and help Law Enforcement Agencies (LEAs) in their investigation in finding potential criminals and identity resolution. Lack of qualitative data and appropriate methodology to carry out research on criminal fraudulent identities is a common reason for fewer research in this area. Additionally, it is a very sensitive data to work with and minor inaccuracy in prediction of result causes massive impact in the society as genuine people could be questioned whereas criminals could be sent free. Both of these issues are addressed in this paper by application of 39 million records from policing dataset and working towards higher accuracy while building the model. Various machine learning approaches are applied to train the dataset to make predictions and the research focus on being able to predict the 5 suspected fraudulent identities out of 39 million records in the policing dataset. One of the applied machine learning techniques include TensorFlow along with Keras model which has seldomly been applied by researchers in detection of criminal data. To compare the results and test accuracy of TensorFlow model, other machine learning techniques such as Support Vector Machine, Naive Bayes and K-nearest Neighbours are also applied to have a comparative study on the obtained outcomes from each model. The goal of this research is to find fraudulent IDs amongst all the anonymized IDs in the criminal dataset using TensorFlow and three other machine learning models and select the most optimal model out of them. Since the model is comparing two names so string-matching techniques such as Levenshtein edit distance, Hamming Distance, Jaro-Winkler and Soundex were applied to select an effective approach first before building the model and analysing the results. TensorFlow model demonstrated highest accuracy with relatively least execution time and the only model to successfully predict all the 5 suspects from the policing dataset.
更多
查看译文
关键词
Identity resolution,Policing dataset,TensorFlow,Support vector machine,K-nearest neighbour,Naive Bayes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要