Cross-source Data Error Detection Approach Based on Federated Learning.

Lu Chen , Yuxiang Guo,Congcong Ge,Baihua Zheng,Yunjun Gao

International Journal of Software and Informatics（2023）

引用 0|浏览16

暂无评分

摘要

PDF HTML XML Export Cite reminder Cross-source Data Error Detection Approach Based on Federated Learning DOI: 10.21655/ijsi.1673-7288.00295 Author: Affiliation: Clc Number: Fund Project: Article | Figures | Metrics | Reference | Related | Cited by | Materials | Comments Abstract:With the emergence and accumulation of massive data, data governance has become an important manner to improve data quality and maximize data value. Specifically, data error detection is a crucial step to improve data quality, which has attracted wide attention from both industry and academia. At present, various detection methods tailored for a single data source have been proposed. However, in many real-world scenarios, data are not centrally stored or managed. Data from different sources but highly correlated can be employed to improve the accuracy of error detection. Unfortunately, due to privacy/security issues, cross-source data are often not allowed to be integrated centrally. To this end, this paper proposes FeLeDetect, a cross-source data error detection method based on federated learning, so as to improve the error detection accuracy by using cross-source data information on the premise of data privacy. First, a Graph-based Error Detection Model, namely GEDM, is presented to capture sufficient data features from each data source. On this basis, the paper then designs a federated co-training algorithm, namely FCTA, to collaboratively train GEDM by using different cross-source data without privacy leakage of data. Furthermore, the paper designs a series of optimization methods to reduce communication costs during federated learning and manual labeling efforts. Finally, extensive experiments on three real-world datasets demonstrate that (1) GEDM achieves an average improvement of 10.3% and 25.2% in terms of the $F1$ score in the local and centralized scenarios, respectively, outperforming all the five existing state-of-the-art methods for error detection; (2) the F1 score of the error detection by FeLeDetect is 23.2% on average higher than that by GEDM in the local scenario. Reference Related Cited by

查看译文

关键词

cross-source

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要