Bayesian semi-supervised Chinese word segmentation for statistical machine translation

COLING(2008)

引用 82|浏览46
暂无评分
摘要
Words in Chinese text are not naturally separated by delimiters, which poses a challenge to standard machine translation (MT) systems. In MT, the widely used approach is to apply a Chinese word segmenter trained from manually annotated data, using a fixed lexicon. Such word segmentation is not necessarily optimal for translation. We propose a Bayesian semi-supervised Chinese word segmentation model which uses both monolingual and bilingual information to derive a segmentation suitable for MT. Experiments show that our method improves a state-of-the-art MT system in a small and a large data environment.
更多
查看译文
关键词
statistical machine translation,chinese word segmenter,large data environment,state-of-the-art mt system,word segmentation,chinese text,standard machine translation,annotated data,bilingual information,fixed lexicon,chinese word segmentation model,machine translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要