From Image to Translation: Processing the Endangered Nyushu Script.

ACM Trans. Asian & Low-Resource Lang. Inf. Process.(2016)

引用 2|浏览74
暂无评分
摘要
The lack of computational support has significantly slowed down automatic understanding of endangered languages. In this paper, we take Nyushu (simplified Chinese: 女书; literally: “women’s writing”) as a case study to present the first computational approach that combines Computer Vision and Natural Language Processing techniques to deeply understand an endangered language. We developed an end-to-end system to read a scanned hand-written Nyushu article, segment it into characters, link them to standard characters, and then translate the article into Mandarin Chinese. We propose several novel methods to address the new challenges introduced by noisy input and low resources, including Nyushu-specific feature selection for character segmentation and linking, and character linking lattice based Machine Translation. The end-to-end system performance indicates that the system is a promising approach and can serve as a standard benchmark.
更多
查看译文
关键词
Endangered languages,nyushu,recognition,translation,Endangered languages,nyushu,recognition,translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要