Practical Black Box Model Inversion Attacks Against Neural Nets

MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II(2021)

引用 2|浏览5
暂无评分
摘要
Adversarial machine learning is a set of malicious techniques that aim to exploit machine learning's underlying mathematics. Model inversion is a particular type of adversarial machine learning attack where an adversary attempts to reconstruct the target model's private training data. Specifically, given black box access to a target classifier, the attacker aims to recreate a particular class sample with just the ability to query the model. Traditionally, these attacks have depended on the target classifier returning a confidence vector. The process of model inversion iteratively creates an image to maximize the target model's confidence of a particular class. Our technique allows the attack to be performed with only a one-hot-encoded confidence vector from the target. The approach begins with performing model extraction, e.g. training a local model to mimic the behavior of a target model. Then we perform inversion on the local model within our control. Through this combination, we introduce the first model inversion attack that can be performed in a true black box setting; i.e. without knowledge of the target model's architecture, and by only using outputted class labels. This is possible due to transferability properties inherent in our model extraction approach known as Jacobian Dataset Augmentation. Throughout this work, we will train shallow Artificial Neural Nets (ANNs) to mimic deeper ANNs, and CNNs. These shallow local models allow us to extend Fredrikson et al.'s inversion attack to invert more complex models than previously thought possible.
更多
查看译文
关键词
Black box model inversion attacks, Adversarial machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要