Joint Learning for Document-Level Threat Intelligence Relation Extraction and Coreference Resolution Based on GCN

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)(2020)

引用 3|浏览15
暂无评分
摘要
In order to help researchers quickly understand the connection between new threat events and previous threat events, threat intelligence document-level relation extraction plays a very important role in threat intelligence text analysis and processing. Because there is no public document-level threat intelligence dataset, we create APTERC-DOC, an APT intelligence entities, relations and coreference dataset. We treat the relation extraction as a multi-classification task. Treating the coreference relation as a kind of predefined relations, we develop a joint learning framework called TIRECO, a model which can simultaneously complete threat intelligence relation extraction and coreference resolution. In order to solve the problem of document-level text being too long to extract feature, we propose the concept of sentence set, which transforms document-level relation extraction into inter-sentence relation extraction. To incorporate relevant information with maximally removing irrelevant content in sentence set, we further apply a novel pruning strategy (SDP-VP-SET) to the input trees considering that verbs are crucial in determining the relation between entities in sentence set. With retaining the shortest path and nodes that are K hops away from the shortest path, we give the edge connected to the verb nodes a weight of w times. Experimental results show that our model not only performs well in the extraction of inter-sentence relations, it is also effective in intra-sentence relations, and the F1 value has increased by 15.694%.
更多
查看译文
关键词
relation extraction,coreference resolution,threat intelligence,APTERC-DOC,GCN,SDP-VP-SET
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要