Sorting the Digital Stream: Big Data-driven Insights into Email Classification for Spam and Ham Detection.

Syed Attique Shah, Emil Anthony Arputham,Awais Ahmed, Mohamed Ben Farah, Attal Shah,Abdul Aziz

2023 IEEE International Conference on Big Data (BigData)(2023)

引用 0|浏览0
暂无评分
摘要
In contemporary email communication, the ever- expanding volume of digital correspondence has ushered in an era where big data plays a pivotal role in addressing the challenge of distinguishing between legitimate (ham) and unsolicited (spam) emails. The primary objective of this paper is the meticulous identification and establishment of criteria for the discrimination between ham and spam emails. To achieve this, the study harnesses data from three distinct datasets, aiming to identify common attributes shared across all emails, irrespective of their classification, while concurrently devising methodologies for precise spam detection. Central to this endeavor is the evaluation of the effectiveness of feature selection techniques, specifically Chi-Square and Pearson Correlation, in elevating the accuracy of email classification. The investigation extends to assessing how the combination of these feature selection techniques with the broader machine learning framework can be optimized. This optimization entails the application of diverse preprocessing techniques to the datasets, all designed to amplify the precision of email classification. Furthermore, this research scrutinizes the performance evaluation metrics employed in the assessment of email classifiers. By conducting comprehensive experiments, the study identifies optimal classifiers based on rigorous evaluation metrics. This contributes valuable insights to the toolkit of techniques for proficient email classification within the realm of big data analysis.
更多
查看译文
关键词
Machine Learning,Classification,Email Spam Detection,Decision Tree,Naive Bayes,Artificial Neural Network,Data Pre-processing,Feature Selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要