Using Part-of-Speech Reranking to Improve Chinese Word Segmentation

SIGHAN@COLING/ACL(2006)

引用 24|浏览9
暂无评分
摘要
Chinese word segmentation and Part-of- Speech (POS) tagging have been com- monly considered as two separated tasks. In this paper, we present a system that performs Chinese word segmentation and POS tagging simultaneously. We train a segmenter and a tagger model separately based on linear-chain Conditional Ran- dom Fields (CRF), using lexical, morpho- logical and semantic features. We propose an approximated joint decoding method by reranking the N-best segmenter out- put, based POS tagging information. Ex- perimental results on SIGHAN Bakeoff dataset and Penn Chinese Treebank show that our reranking method significantly improve both segmentation and POS tag- ging accuracies.
更多
查看译文
关键词
part of speech
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要