Cyber-Security Knowledge Graph Generation by Hierarchical Nonnegative Matrix Factorization
arxiv(2024)
摘要
Much of human knowledge in cybersecurity is encapsulated within the
ever-growing volume of scientific papers. As this textual data continues to
expand, the importance of document organization methods becomes increasingly
crucial for extracting actionable insights hidden within large text datasets.
Knowledge Graphs (KGs) serve as a means to store factual information in a
structured manner, providing explicit, interpretable knowledge that includes
domain-specific information from the cybersecurity scientific literature. One
of the challenges in constructing a KG from scientific literature is the
extraction of ontology from unstructured text. In this paper, we address this
topic and introduce a method for building a multi-modal KG by extracting
structured ontology from scientific papers. We demonstrate this concept in the
cybersecurity domain. One modality of the KG represents observable information
from the papers, such as the categories in which they were published or the
authors. The second modality uncovers latent (hidden) patterns of text
extracted through hierarchical and semantic non-negative matrix factorization
(NMF), such as named entities, topics or clusters, and keywords. We illustrate
this concept by consolidating more than two million scientific papers uploaded
to arXiv into the cyber-domain, using hierarchical and semantic NMF, and by
building a cyber-domain-specific KG.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要