Managing Personal Identifiable Information in Data Lakes

IEEE ACCESS(2024)

引用 0|浏览0
暂无评分
摘要
Privacy is a fundamental human right according to the Universal Declaration of Human Rights of the United Nations. Adoption of the General Data Protection Regulation (GDPR) in European Union in 2018 was turning point in management of personal data, specifically personal identifiable information (PII). Although there were many previous privacy laws in existence before, GDPR has brought privacy topic in the regulatory spotlight. Two most important novelties are seven basic principles related to processing of personal data and huge fines defined for violation of the regulation. Many other countries have followed the EU with the adoption of similar legislation. Personal data management processes in companies, especially in analytical systems and Data Lakes, must comply with the regulatory requirements. In Data Lakes, there are no standard architectures or solutions for the need to discover personal identifiable information, match data about the same person from different sources, or remove expired personal data. It is necessary to upgrade the existing Data Lake architectures and metadata models to support these functionalities. The goal is to study the current Data Lake architecture and metadata models and to propose enhancements to improve the collection, discovery, storage, processing, and removal of personal identifiable information. In this paper, a new metadata model that supports the handling of personal identifiable information in a Data Lake is proposed.
更多
查看译文
关键词
Big Data applications,Metadata,Computer architecture,General Data Protection Regulation,Feature extraction,Privacy,Data lakes,Identification of persons,Knowledge discovery,Data collection,Data lake,personal identifiable information metadata,personal data,data discovery,entity linking,data removal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要