Extracting And Matching Authors And Affiliations In Scholarly Documents

JCDL '13: 13th ACM/IEEE-CS Joint Conference on Digital Libraries Indianapolis Indiana USA July, 2013(2013)

引用 21|浏览14
暂无评分
摘要
We introduce Enlil, an information extraction system that discovers the institutional affiliations of authors in scholarly papers. Enlil consists of two steps: one that first identifies authors and affiliations using a conditional random field; and a second support vector machine that connects authors to their affiliations. We benchmark Enlil in three separate experiments drawn from three different sources: the ACL Anthology, the ACM Digital Library, and a set of cross-disciplinary scientific journal articles acquired by querying Google Scholar. Against a state-of-the-art production baseline, Enlil reports a statistically significant improvement in F-1 of nearly 10% (p << 0.01). In the case of multidisciplinary articles from Google Scholar, Enlil is benchmarked over both clean input (F-1 > 90%) and automatically-acquired input (F-1 > 80%).We have deployed Enlil in a case study involving Asian genomics research publication patterns to understand how government sponsored collaborative links evolve. Enlil has enabled our team to construct and validate new metrics to quantify the facilitation of research as opposed to direct publication.
更多
查看译文
关键词
Metadata Extraction,Logical Structure Discovery,Conditional Random Fields,Support Vector Machine,Rich Document Features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要