Inversion-guided Defense: Detecting Model Stealing Attacks by Output Inverting

IEEE Transactions on Information Forensics and Security(2024)

引用 0|浏览4
暂无评分
摘要
Model stealing attacks involve creating copies of machine learning models that have similar functionalities to the original model without proper authorization. Such attacks raise significant concerns about the intellectual property of the machine learning models. Nonetheless, current defense mechanisms against such attacks tend to exhibit certain drawbacks, notably in terms of utility, and robustness. For example, watermarking-based defenses require victim models to be retrained for embedding watermarks, which can potentially impact the main task performance. Moreover, other defenses, especially fingerprinting-based methods, often rely on specific samples like adversarial examples to verify ownership of the target model. These approaches might prove less robust against adaptive attacks, such as model stealing with adversarial training. It remains unclear whether normal examples, as opposed to adversarial ones, can effectively reflect the characteristics of stolen models. To tackle these challenges, we propose a novel method that leverages a neural network as a decoder to inverse the suspicious model’s outputs. Inspired by model inversion attacks, we argue that this decoding process will unveil hidden patterns inherent in the original outputs of the suspicious model. Drawing from these decoding outcomes, we calculate specific metrics to determine the legitimacy of the suspicious models. We validate the efficacy of our defense technique against diverse model stealing attacks, specifically within the domain of classification tasks based on deep neural networks.
更多
查看译文
关键词
Model stealing attacks,model IP protection,fingerprinting,model inversion,adversarial examples
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要