CRF-based Bibliography Extraction from Reference Strings Focusing on Various Token Granularities

Document Analysis Systems(2012)

引用 5|浏览0
暂无评分
摘要
The references of academic articles include important bibliographic elements such as authors' names and article titles. Automatic extraction of these elements is useful because they can be used for various purposes, including searching. In this paper, a method for automatically extracting bibliographic elements from the text of reference strings is proposed. The proposed method assigns bibliographic labels to reference strings by using linguistic information and conditional random fields. Experimental results indicated that the extraction accuracies of major bibliographies were more than 96%.
更多
查看译文
关键词
reference string,bibliographic element,reference strings,important bibliographic element,various token granularities,academic article,article title,conditional random field,extraction accuracy,automatic extraction,bibliographic label,crf-based bibliography extraction,data mining,digital signal processing,probability,random processes,delimiter,labeling,tokenization,citation analysis,accuracy,linguistic information,data models,hidden markov models,conditional probability,text analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要