Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks.

AKBC@NAACL-HLT(2016)

引用 3|浏览29
暂无评分
摘要
This paper is an attempt to raise pertinent questions and act as platform to generate fruitful discussions within the AKBC community about the need for a large scale dataset for relation extraction. For proper training and evaluation of relation extraction tasks, the weaknesses of datasets used so far need to be tackled: mainly the size (too small) and the amount of data that is actually labelled (unlabelled data leading to recall problems). We have the vision of building a new large and fully labelled dataset for entity pairs connected via binary relations from both Freebase as well as other datasets, such as Clueweb. Concerning the process of building, we present pioneering work on a roadmap which will serve as the foundation for the intended discussion within the community. Points to discuss arise within the following steps: first, the source data has to be preprocessed in order to ensure that the set of relations consists of valid relations only; second, we suggest a method to find the most relevant relations for an entity pair; and third, we outline approaches on how to actually label the data. It is necessary to discuss several key issues in the process of generating this dataset. This will enable us to thoroughly create a dataset that will have the potential to serve as a standard to the community.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要