A Model-Agnostic Approach to Differentially Private Topic Mining
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(2022)
摘要
Topic mining extracts patterns and insights from text data (e.g., documents, emails and product reviews), which can be used in various applications such as intent detection. However, topic mining can result in severe privacy threats to the users who have contributed to the text corpus since they can be re-identified from the text data with certain background knowledge. To our best knowledge, we propose the first differentially private topic mining technique (namely TopicDP) which injects well-calibrated Gaussian noise into the matrix output of any topic mining algorithm to ensure differential privacy and good utility. Specifically, we smoothen the sensitivity for the Gaussian mechanism via sensitivity sampling, which addresses the major challenges resulted from the high sensitivity in topic mining for differential privacy. Furthermore, we theoretically prove the differential privacy guarantee under the Rényi differential privacy mechanism and the utility error bounds of TopicDP. Finally, we conduct extensive experiments on two real-word text datasets (Enron email and Amazon Reviews), and the experimental results demonstrate that TopicDP is a model-agnostic framework that can generate better privacy preserving performance for topic mining as compared against other differential privacy mechanisms.
更多查看译文
关键词
mining,model-agnostic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络