Slicing and Dicing a Newspaper Corpus for Historical Ecology Research.

EKAW(2018)

引用 25|浏览48
暂无评分
摘要
Historical newspapers are a novel source of information for historical ecologists to study the interactions between humans and animals through time and space. Newspaper archives are particularly interesting to analyse because of their breadth and depth. However, the size and the occasional noisiness of such archives also brings difficulties, as manual analysis is impossible. In this paper, we present experiments and results on automatic query expansion and categorisation for the perception of animal species between 1800 and 1940. For query expansion and to the manual annotation process, we used lexicons. For the categorisation we trained a Support Vector Machine model. Our results indicate that we can distinguish newspaper articles that are about animal species from those that are not with an F(_{1}) of 0.92 and the subcategorisation of the different types of newspapers on animals up to 0.84 F(_{1}).
更多
查看译文
关键词
Natural language processing, Lexicology, Humanities, Historical ecology, Digital libraries
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要