A statistical model for lost language decipherment

ACL(2010)

引用 81|浏览20
暂无评分
摘要
In this paper we propose a method for the automatic decipherment of lost languages. Given a non-parallel corpus in a known related language, our model produces both alphabetic mappings and translations of words into their corresponding cognates. We employ a non-parametric Bayesian framework to simultaneously capture both low-level character mappings and high-level morphemic correspondences. This formulation enables us to encode some of the linguistic intuitions that have guided human decipherers. When applied to the ancient Semitic language Ugaritic, the model correctly maps 29 of 30 letters to their Hebrew counterparts, and deduces the correct Hebrew cognate for 60% of the Ugaritic words which have cognates in Hebrew.
更多
查看译文
关键词
high-level morphemic correspondence,ancient semitic language,lost language decipherment,hebrew counterpart,statistical model,automatic decipherment,related language,alphabetic mapping,ugaritic word,correct hebrew cognate,corresponding cognate,lost language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要