Improved Named Entity Tagset for Punjabi Language

Engineering and Computational Sciences(2014)

引用 5|浏览14
暂无评分
摘要
Annotated corpus plays an important role in developing machine learning based Named Entity Recognition system. For creating an annotated corpus, it is important to decide in advance the Named Entity Tagset to be used. A Named Entity Tagset is defined as a collection of tags or labels, in the form of a scheme, indicating the named entity class of a word to which it belongs in the text. In this paper we have proposed an improved Named Entity Tagset of 14 tags for the task of Named Entity Recognition in Punjabi Language. This improvement was realized from the challenges faced during annotation process in our previous research work with 12 tags. Apart from this we have discussed the importance and issues related to defining a Named Entity Tagset and annotation guidelines. We have also discussed various global tagsets found in Literature. We have referred Extended Named Entity Hierarchy for improving our current tagset.
更多
查看译文
关键词
learning (artificial intelligence),natural language processing,punjabi language,annotated corpus,annotation guidelines,extended named entity hierarchy,machine learning based named entity recognition system,named entity tagset,named entity recognition,punjabi,tagset design issues,geology,organizations,learning artificial intelligence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要