Neural ParsCit: a deep learning-based reference string parser
Int. J. on Digital Libraries(2018)
摘要
We present a deep learning approach for the core digital libraries task of parsing bibliographic reference strings. We deploy the state-of-the-art long short-term memory (LSTM) neural network architecture, a variant of a recurrent neural network to capture long-range dependencies in reference strings. We explore word embeddings and character-based word embeddings as an alternative to handcrafted features. We incrementally experiment with features, architectural configurations, and the diversity of the dataset. Our final model is an LSTM-based architecture, which layers a linear chain conditional random field (CRF) over the LSTM output. In extensive experiments in both English in-domain (computer science) and out-of-domain (humanities) test cases, as well as multilingual data, our results show a significant gain ( p<0.01 ) over the reported state-of-the-art CRF-only-based parser.
更多查看译文
关键词
Reference string parsing, Sequence labeling, CRF, LSTM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络