TMD-NER: Turkish multi-domain named entity recognition for informal texts

Signal, Image and Video Processing(2023)

引用 0|浏览0
暂无评分
摘要
We examine named entity recognition (NER), an essential and commonly used first step in many natural language processing tasks, including chatbots and language translation. We focus on the application of NER to texts that have a lot of noise, such as tweets, which is difficult due to the casual and unstructured language often used in these mediums. In this study, we make use of the largest available labeled data sets for Turkish NER, specifically targeting three informal platforms, namely Twitter, Facebook and Donanimhaber. We choose Turkish as a representative agglutinative language, which has a significantly different structure than other well-known languages such as English, French, and German. We emphasize that the methodologies and insights gained from this study can be extended to other agglutinative languages, like Finnish, Hungarian, Japanese, and Korean. We apply NER to these datasets using 16 different named entity tags through a framework that employs bidirectional long short-term memory (BiLSTM) networks followed by conditional random fields (CRF), known together as the BiLSTM-CRF model. Our experiments show an F1 score of 84
更多
查看译文
关键词
Named entity recognition,Turkish language,Bidirectional long short-term memory,Conditional random fields
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要