Towards Ethical Content-Based Detection Of Online Influence Campaigns

2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP)(2019)

引用 6|浏览0
暂无评分
摘要
The detection of clandestine efforts to influence users in online communities is a challenging problem with significant active development. We demonstrate that features derived from the text of user comments are useful for identifying suspect activity, but lead to increased erroneous identifications (false positive classifications) when keywords over-represented in past influence campaigns are present. Drawing on research in native language identification (NLI), we use “named entity masking” (NEM) to create sentence features robust to this shortcoming, while maintaining comparable classification accuracy. We demonstrate that while NEM consistently reduces false positives when key named entities are mentioned, both masked and unmasked models exhibit increased false positive rates on English sentences by Russian native speakers, raising ethical considerations that should be addressed in future research.
更多
查看译文
关键词
influence campaign detection,native language identification,algorithmic bias,natural language processing,bidirectional encoder representations from transformers (BERT)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要