Attribute Extraction by Combing Feature Ranking and Sequence Labeling

2018 IEEE International Conference on Big Data and Smart Computing (BigComp)(2018)

引用 2|浏览26
暂无评分
摘要
Due to the language characteristics, it is a challenge for the knowledge extraction of Chinese text documents. In this paper, an attribute extraction method based on feature ranking and sequence labeling is proposed. Firstly, we obtain the training corpus by annotating Wikipedia texts with the attribute information extracted from the information box of Wikipedia. To improve the quality of training corpus, the trigger keywords are filtered based on the information entropy. The attribute extraction is regarded as a sequence labeling problem, which exploits the multidimensional features such as part of speech and word context. Then, the conditional random field model is trained on the corpus to extract attributes from the unstructured texts. Experiment results show that our method can effectively improve the quality of training corpus using the keyword filtering technique, and hence improve the performance of attribute extraction. Compared with the rule-based attribute extraction methods, our method can be extended to other fields, which has better portability and expansibility.
更多
查看译文
关键词
attribute extraction,feature ranking,sequence labeling,conditional random field
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要