Subdomain adaptation of a POS tagger with a small corpus.

LNLBioNLP '06: Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology(2006)

引用 7|浏览15
暂无评分
摘要
For the domain of biomedical research abstracts, two large corpora, namely GENIA (Kim et al 2003) and Penn BioIE (Kulik et al 2004) are available. Both are basically in human domain and the performance of systems trained on these corpora when they are applied to abstracts dealing with other species is unknown. In machine-learning-based systems, re-training the model with addition of corpora in the target domain has achieved promising results (e.g. Tsuruoka et al 2005, Lease et al 2005). In this paper, we compare two methods for adaptation of POS taggers trained for GENIA and Penn BioIE corpora to Drosophila melanogaster (fruit fly) domain.
更多
查看译文
关键词
human domain,target domain,Penn BioIE,Penn BioIE corpus,Drosophila melanogaster,biomedical research abstract,large corpus,machine-learning-based system,promising result,POS tagger,Subdomain adaptation,small corpus
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要