Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection.

International Conference on Language Resources and Evaluation (LREC)(2022)

引用 0|浏览0
暂无评分
摘要
In this paper we present the Serbian part of the ELTeC multilingual corpus of novels written in the time period 1840-1920. The corpus is being built in order to test various distant reading methods and tools with the aim of re-thinking the European literary history. We present the various steps that led to the production of the Serbian sub-collection: the novel selection and retrieval, text preparation, structural annotation, POS-tagging, lemmatization and named entity recognition. The Serbian sub-collection was published on different platforms in order to make it freely available to various users. Several use examples show that this sub-collection is useful for both close and distant reading approaches.
更多
查看译文
关键词
Corpus, Distant Reading, Digital Humanities, Linked Data, Named Entity Recognition, Text Analytics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要