A proposition for resilient graph-based record linkage using parallel processing on distributed networks

2015 Resilience Week (RWS)(2015)

引用 0|浏览1
暂无评分
摘要
Industrial and governmental organizations have accrued vast amounts of data contained in many databases. Many of these databases are developed by different organizations for different purposes, may contain millions of unique entities and may lack a dependable global unique identifier to link an individual's records across multiple databases. Record Linkage (RL) is a process that connects records that are related to the identical or sufficiently similar entity from multiple heterogeneous databases [1]. Whether the RL system uses a deterministic or probabilistic [2] methodology, it is necessary to compare the data within each pair of candidate records, field-by-field. Demographic and other data is used for pattern matches to determine if two records belong to the same entity. RL is a data and compute intensive mission critical process for many organizations. The process must be efficient enough to process big data, effective enough to provide accurate matches and resilient enough to ensure reliable operation.
更多
查看译文
关键词
resilient graph-based record linkage,parallel processing,distributed networks,industrial organizations,governmental organizations,heterogeneous databases,RL system,deterministic methodology,probabilistic methodology,pattern matching,mission critical process,Big Data processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要