ZoomNet for Topic-Oriented Fragment Recognition in Long Documents

IEEE ACCESS(2022)

引用 0|浏览33
暂无评分
摘要
This work introduces a new information extraction task called Topic-Oriented Fragment Recognition (TOFR), whose goal is to recognize information related to a specific topic in long documents from professional fields. In this paper, we introduce two TOFR datasets to study the problems of processing long documents. We propose a novel neural framework named Zooming Network (ZoomNet), which overcomes the challenge of combining information over long distances with limited computing resources by flexibly switching between skimming and intensive reading in processing long documents. In general, ZoomNet first establishes a hierarchical representation aligned to the text structure, which relieves the conflict between local information and extensive contextual information. Then, it synthesizes different levels of information to assign tags via multi-scale actions. We combine supervised and reinforcement learning methods to train our model. Experiments show that the proposed model outperforms several state-of-the-art sequence labeling models, including BiLSTM-CRF, BERT, XLNET, RoBERTa, and ELECTRA, on both TOFR datasets with big margins.
更多
查看译文
关键词
Task analysis, Labeling, Decoding, Context modeling, Encoding, Information retrieval, Computational modeling, Information extraction, neural network, long documents, reinforcement learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要