Using Transfer Learning to Identify Privacy Leaks in Tweets

Saul Ricardo Medrano Castillo,Zhiyuan Chen

2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC)(2016)

引用 5|浏览18
暂无评分
摘要
Users of online social networks often disclose a lot of sensitive information intentionally or unintentionally, allowing different organizations such as the government, advertising companies, or criminals to exploit such information. In this paper, we focus on identifying privacy leaks such as being pregnant and being drunk in the content of tweets. This problem is non trivial for two reasons. First, we need to differentiate tweets that indeed contain privacy leaks from tweets that do not. e.g., a tweet may talk about a celebrity getting pregnant or selling products for pregnant women and thus is not privacy sensitive. Second, most existing solutions build a supervised learning model for each type of private leaks, but there could be many types of leaks so such solutions require labeling a large number of tweets for each type of leaks, which could be quite tedious and not easily generalizable. Our main contribution is that we apply transfer learning techniques such that we can use training data for one type of privacy leaks for another type of leaks which shares some common ground but is not exactly the same. This greatly reduces the labeling effort and makes our solution more generalizable. Experimental results validated the benefit of our approach: only 7% of data for the new type of leaks need to be labeled to achieve similar results as using 100% labeled data.
更多
查看译文
关键词
privacy,social network,transfer learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要