AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports
arxiv(2024)
摘要
Monitoring the threat landscape to be aware of actual or potential attacks is
of utmost importance to cybersecurity professionals. Information about cyber
threats is typically distributed using natural language reports. Natural
language processing can help with managing this large amount of unstructured
information, yet to date, the topic has received little attention. With this
paper, we present AnnoCTR, a new CC-BY-SA-licensed dataset of cyber threat
reports. The reports have been annotated by a domain expert with named
entities, temporal expressions, and cybersecurity-specific concepts including
implicitly mentioned techniques and tactics. Entities and concepts are linked
to Wikipedia and the MITRE ATT CK knowledge base, the most widely-used taxonomy
for classifying types of attacks. Prior datasets linking to MITRE ATT CK either
provide a single label per document or annotate sentences out-of-context; our
dataset annotates entire documents in a much finer-grained way. In an
experimental study, we model the annotations of our dataset using
state-of-the-art neural models. In our few-shot scenario, we find that for
identifying the MITRE ATT CK concepts that are mentioned explicitly or
implicitly in a text, concept descriptions from MITRE ATT CK are an effective
source for training data augmentation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要