Named Entity Recognition Utilized to Enhance Text Classification While Preserving Privacy

IEEE Access(2023)

引用 0|浏览4
暂无评分
摘要
Recent development in Natural Language Processing (NLP) techniques has encouraged NLP-based application in various field including business, legal and health. An important process for all NLP projects is text preprocessing which is a process that modifies text data before using them in a machine learning model. Usually text preprocessing process includes cleaning, filtering, removing and replacing some texts to increase model accuracy, robustness, reduce data size or preserve privacy. Named entities recognizer (NER) is an NLP tool which finds Named Entities in text such as: names, organization, addresses, numbers and date. In this work, we create a preproccessing approach that uses NER to find named entities and, then, replace them with their type i.e. location, person or organization name to improve accuracy and preserve privacy instead of removing them or letting them become noise to our data. Experiments for text classification task using our approach have been conducted on several datasets some of which were collected in-house. Experiments indicate that using this approach enhances classifier accuracy and reduces feature representation's dimensionality while, also, preserve privacy.
更多
查看译文
关键词
Named entities,preprocessing,text classification,privacy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要