The Utility of Context When Extracting Entities From Legal Documents

Jonathan Donnelly,Adam Roegiest

CIKM '20: The 29th ACM International Conference on Information and Knowledge Management Virtual Event Ireland October, 2020(2020)

引用 5|浏览14
暂无评分
摘要
When reviewing documents for legal tasks such as Mergers and Acquisitions, granular information (such as start dates and exit clauses) need to be identified and extracted. Inspired by previous work in Named Entity Recognition (NER), we investigate how NER techniques can be leveraged to aid lawyers in this review process. Due to the extremely low prevalence of target information in legal documents, we find that the traditional approach of tagging all sentences in a document is inferior, in both effectiveness and data required to train and predict, to using a first-pass layer to identify sentences that are likely to contain the relevant information and then running the more traditional sentence-level sequence tagging. Moreover, we find that such entity-level models can be improved by training on a balanced sample of relevant and non-relevant sentences. We additionally describe the use of our system in production and how its usage by clients means that deep learning architectures tend to be cost inefficient, especially with respect to the necessary time to train models.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要