Morphological Analysis of a Large Spontaneous Speech Corpus in Japanese
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence(2007)
摘要
This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontaneous speech corpus accurately by using the two methods. The first method is used to detect any type of word segments. The second method is used when there are several definitions for word segments and their POS categories, and when one type of word segments includes another type of word segments. In this paper, we show that by using semi-automatic analysis we achieve a precision of better than 99% for detecting and tagging short words and 97% for long words; the two types of words that comprise the corpus. We also show that better accuracy is achieved by using both methods than by using only the first.
更多查看译文
关键词
word segment,word unit,labor cost,morphological annotation,low labor cost,large spontaneous speech corpus,active learning,better accuracy,morphological information,japanese spontaneous speech corpus,high accuracy,morphological analysis,training corpus,long word,short word,pos category,semi-automatic analysis,efficient framework,humanaided morphological annotation,spontaneous japanese
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络