Development of a Greek biomedical corpus

msra(2007)

引用 23|浏览28
暂无评分
摘要
Collection and annotation of specialized corpora, for less-spoken languages such as Greek, is crucial endeavour for the development and growth of the language technology research for these languages. This paper presents the design and compilation of a biomedical corpus that took place in the framework of the national R&D project "IATROLEXI" (http://www.iatrolexi.gr). The aim of IATROLEXI is to create the critical infrastructure for the Greek language, i.e. linguistic resources and tools, to be used in advanced natural language processing (NLP) applications, i.e. information extraction, data mining, etc., in the domain of biomedicine. The project will build upon existing resources that have been developed by the project partners, i.e. a Greek morphological lexicon of about 100.000 words, and language processing tools such as a lemmatizer and a morphosyntactic tagger, and it will further develop new resources such as a specialised corpus of biomedical texts that is presented in this paper and an ontology of medical terminology.
更多
查看译文
关键词
nlp,corpus linguistics,biomedical terminology,biomedical corpus
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要