Automatic Orality Identification in Historical Texts.

LREC(2020)

引用 0|浏览5
暂无评分
摘要
Independently of the medial representation (written/spoken), language can exhibit characteristics of conceptual orality or literacy, which mainly manifest themselves on the lexical or syntactic level. In this paper we aim at automatically identifying conceptually-oral historical texts, with the ultimate goal of gaining knowledge about spoken data of historical time stages. We apply a set of general linguistic features that have been proven to be effective for the classification of modern language data to historical German texts from various registers. Many of the features turn out to be equally useful in determining the conceptuality of historical data as they are for modern data, especially the frequency of different types of pronouns and the ratio of verbs to nouns. Other features like sentence length, particles or interjections point to peculiarities of the historical data and reveal problems with the adoption of a feature set that was developed on modern language data.
更多
查看译文
关键词
historical texts
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要