Encryption-based sub-string matching for privacy-preserving record linkage

Journal of Information Security and Applications(2024)

引用 0|浏览0
Accurate and secure string matching in record linkage is increasingly important in application domains such as bioinformatics, healthcare, and crime detection. Most existing privacy-preserving string matching techniques provide an overall similarity between a pair of strings. As a result, these techniques cannot identify the longest common sub-string between the strings in a pair leading to lower linkage quality, while existing techniques that can identify the longest common sub-string from a pair of strings have long runtimes. While blocking techniques that can be used in the record linkage pipeline improve the time complexity, each string is generally inserted into several blocks making it vulnerable to frequency based attacks. In this paper, we propose two encryption-based approaches to improve the effectiveness and efficiency of string matching in record linkage. Our approaches compare strings based on their lengths of sub-strings. In the first approach, we encrypt the sub-string lengths into individual ciphertexts and compare a pair of ciphertexts based on the corresponding sub-string. In the second approach, we encrypt multiple lengths of sub-strings into a single ciphertext that allows efficient comparison of ciphertexts. We evaluate our approaches on real-world datasets and validate the accuracy, complexity, and privacy compared to four baselines, showing that our approaches outperform all baselines in terms of complexity and privacy while providing higher linkage quality than a standard privacy-preserving record linkage technique.
Privacy-preserving record linkage,String matching,Homomorphic encryption
AI 理解论文
Chat Paper