Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik.

LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION(2016)

引用 25|浏览59
暂无评分
摘要
In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition - capitalization - is absent, as the language's Perso-Arabic script does not make a distinction between uppercase and lowercase letters. We describe a system for deriving an inferred capitalization value from closely related languages by phonological similarity, and illustrate the system using several related Western Iranian languages.
更多
查看译文
关键词
Kurdish,named-entity recognition,phonology
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要