Placing multi-modal, and multi-lingual Data in the Humanities Domain on the Map: The Mythotopia Geotagged Corpus

International Conference on Language Resources and Evaluation (LREC)(2022)

引用 0|浏览6
暂无评分
摘要
The paper gives an account of an infrastructure that will be integrated into a platform aimed at providing a multi-faceted experience to visitors of Northern Greece using mythology as a starting point. This infrastructure comprises a multi-lingual and multi-modal corpus (i.e., a corpus of textual data supplemented with images and video) that belongs to the humanities domain along with a dedicated database (content management system) with advanced indexing, linking, and search functionalities. We will present the corpus itself focusing on the content, the methodology adopted for its development, and the steps taken towards rendering it accessible via the database in a way that also facilitates useful visualizations. In this context, we tried to address three main challenges: (a) to add a novel annotation layer, namely geotagging, (b) to ensure the long-term maintenance of and accessibility to the highly heterogeneous primary data - even after the life cycle of the current project - by adopting a metadata schema that is compatible to existing standards; and ( c) to render the corpus a useful resource to scholarly research in the digital humanities by adding a minimum set of linguistic annotations.
更多
查看译文
关键词
cultural heritage and humanities corpus, cross-media indexing, geo-tagging benchmark
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要