Unsupervised Named Entity Transliteration Using Temporal and Phonetic Correlation.

EMNLP '06: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing(2006)

引用 31|浏览91
暂无评分
摘要
In this paper we investigate unsupervised name transliteration using comparable corpora , corpora where texts in the two languages deal in some of the same topics --- and therefore share references to named entities --- but are not translations of each other. We present two distinct methods for transliteration, one approach using an unsupervised phonetic transliteration method, and the other using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We believe that the novelty of our approach lies in the phonetic-based scoring method, which is based on a combination of carefully crafted phonetic features, and empirical results from the pronunciation errors of second-language learners of English. Unlike previous approaches to transliteration, this method can in principle work with any pair of languages in the absence of a training dictionary, provided one has an estimate of the pronunciation of words in text.
更多
查看译文
关键词
unsupervised name transliteration,unsupervised phonetic transliteration method,distinct method,phonetic-based scoring method,languages deal,phonetic feature,previous approach,pronunciation error,better result,candidate pair,entity transliteration,phonetic correlation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要