Generating and applying data extraction templates

user-5bd69975530c70d56f390249(2019)

引用 5|浏览33
暂无评分
摘要
Methods, apparatus, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of structured communications such as emails may be grouped into clusters based on one or more similarities between the structured communications. A set of structural paths may be identified from structured communications of a particular cluster. One or more structural paths of the set may be classified as transient wherein a count of occurrences of one or more associated segments of text across the particular cluster satisfies a criterion. One or more transient paths may be assigned a semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent structured communications, segments of text associated with transient (and in some cases, non-confidential) structural paths.
更多
查看译文
关键词
Data extraction,Semantic data model,Data mining,Computer science,Template,Implementation,Cluster (physics),Confidentiality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要