Empirical Analysis for the Selection of Baseline Performances for Short Text Classification

2023 IEEE 17th International Conference on Industrial and Information Systems (ICIIS)(2023)

引用 0|浏览6
暂无评分
摘要
Improvements of classification performance for short text have become increasingly popular due to explosive growth of social media and other online communication platforms. As a result, recent research works attempt to improve the classification performances over a selected baseline. Unfortunately, latest research works related to short text classification performance improvements with Neural networks [CNN, LSTM,RNN] and, Embeddings as feature representation haven't use common baseline to compare the performance improvements with their experiments. Therefore, it's hard to compare one work with the other. Hence this research work was carried out to identify the appropriate n-grams, feature representation, preprocessing technique and the machine learning algorithm which suits for the given short text type dataset which can be used as a baseline to compare latest experimented results. We used seven short text types datasets which are publicly available. We compared different n-grams [1-gram,2-gram,3-gram etc], feature representation techniques such as [TF] term frequency and [TF-IDF]term frequency with inverse document frequency techniques as well as the impact of using stop words. Moreover, we compared the classification performances among different traditional machine learning based algorithms. Our conclusions are, combining 1-gram with 2-gram word features gave high performances compared to other n-grams. [2] out of two feature representations techniques TF perform well compared to TF IDF. [3] further, removal of stop word support to improve the classification performances when feature representation type is TF.[4] Moreover, out of traditional machine learning algorithms, SVM gave good results. Finding of our research work can be used to compute baseline classification performances with future research work.
更多
查看译文
关键词
n-grams,TF,TF-IDF,pre processing,SVM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要