AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Experimental results are shown in Tables IX and X, from which we can see that though many generated adversarial texts can be detected by spell checking, TEXTBUGGER still have higher success rate than DeepWordBug on multiple online platforms after correcting the misspelled words

TextBugger: Generating Adversarial Text Against Real-world Applications.

NDSS, (2019)

Cited by: 51|Views581
EI

Abstract

Deep Learning-based Text Understanding (DLTU) is the backbone technique behind various applications, including question answering, machine translation, and text classification. Despite its tremendous popularity, the security vulnerabilities of DLTU are still largely unknown, which is highly concerning given its increasing use in security-...More

Code:

Data:

0
Introduction
  • Deep neural networks (DNNs) have been shown to achieve great success in various tasks such as classification, regression, and decision making.
  • Though DNNs models have exhibited state-of-the-art performance in a lot of applications, recently they have been found to be vulnerable against adversarial examples which are carefully generated by adding small perturbations to the legitimate inputs to fool the targeted models [8, 13, 20, 25, 36, 37]
  • Such discovery has raised serious concerns, especially when deploying such machine learning models to security-sensitive tasks.
  • Network and Distributed Systems Security (NDSS) Symposium 2019 24-27 February 2019, San Diego, CA, USA ISBN 1-891562-55-X https://dx.doi.org/10.14722/ndss.2019.23138 www.ndss-symposium.org
Highlights
  • Deep neural networks (DNNs) have been shown to achieve great success in various tasks such as classification, regression, and decision making
  • We shows that transferability exists in the text domain and the adversarial texts generated against offline models can be successfully transferred to multiple popular online Deep Learning-based Text Understanding systems
  • Experimental results are shown in Tables IX and X, from which we can see that though many generated adversarial texts can be detected by spell checking, TEXTBUGGER still have higher success rate than DeepWordBug on multiple online platforms after correcting the misspelled words
  • We study adversarial attacks against state-ofthe-art sentiment analysis and toxic content detection models/platforms under both white-box and black-box settings
  • Extensive experimental results demonstrate that TEXTBUGGER is effective and efficient for generating targeted adversarial NLP
  • As shown in Table V, it only perturbs 10.3% words of one sample to achieve 92.3% success rate on the LR model, while all baselines achieve no more than 40% attack success rate
  • Our findings show the possibility of spelling check and adversarial training in defending against such attacks
Methods
  • In [16], Jia et al generated adversarial examples for evaluating reading comprehension systems by adding distracting sentences to the input document
  • Their method requires manual intervention to polish the added sentences.
  • In [40], Zhao et al used Generative Adversarial Networks (GANs) to generate adversarial sequences for textual entailment and machine translation applications
  • This method requires neural text generation, which is limited to short texts.
Results
  • Experimental results are shown in Tables

    IX and X, from which the authors can see that though many generated adversarial texts can be detected by spell checking, TEXTBUGGER still have higher success rate than DeepWordBug on multiple online platforms after correcting the misspelled words.
  • When targeting on Perspective API, TEXTBUGGER has 35.6% success rate while DeepWordBug only has 16.5% after spelling check.
  • This means TEXTBUGGER is still effective and stronger than DeepWordBug. Further, the authors analyze the difficulty of correcting each kind of bug.
  • The hardest bug to correct is SubW, which has less than 10% successfully correction ratio
  • This phenomenon partly accounts for why TEXTBUGGER is stronger than DeepWordBug
Conclusion
  • Extension to Targeted Attack.
  • Extensive experimental results demonstrate that TEXTBUGGER is effective and efficient for generating targeted adversarial NLP.
  • The transferability of such examples hint at potential vulnerabilities in many real applications, including text filtering systems, online recommendation systems, etc.
  • The authors' findings show the possibility of spelling check and adversarial training in defending against such attacks.
  • Ensemble of linguisticallyaware or structurally-aware based defense system can be further explored to improve robustness
Tables
  • Table1: EXAMPLES FOR FIVE BUG GENERATION METHODS
  • Table2: RESULTS OF THE WHITE-BOX ATTACK ON KAGGLE DATASET
  • Table3: RESULTS OF SC ON KAGGLE DATASET
  • Table4: RESULTS OF THE WHITE-BOX ATTACKS ON IMDB AND MR DATASETS
  • Table5: RESULTS OF THE BLACK-BOX ATTACK ON MR
  • Table6: RESULTS OF SC ON IMDB AND MR DATASETS
  • Table7: RESULTS OF THE BLACK-BOX ATTACK ON KAGGLE DATASET
  • Table8: RESULTS OF AT ON THREE DATASETS
  • Table9: RESULTS OF THE BLACK-BOX ATTACK ON IMDB
  • Table10: TRANSFERABILITY ON IMDB AND MR DATASETS
Download tables as Excel
Funding
  • This work was partly supported by NSFC under No 61772466, the Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholars under No LR19F020003, the Provincial Key Research and Development Program of Zhejiang, China under No 2017C01055, and the Alibaba-ZJU Joint Research Institute of Frontier Technologies
  • Ting Wang is partially supported by the National Science Foundation under Grant No 1566526 and 1718787
  • Bo Li is partially supported by the Defense Advanced Research Projects Agency (DARPA)
Study subjects and analysis
popular public benchmark datasets: 2
Then we will analyze the results and discuss potential reasons for the observed performance. We study adversarial examples of text on two popular public benchmark datasets for sentiment analysis. The final adversarial examples are generated and evaluated on the test set

samples: 2
Euclidean distance is a measure of the true straight line distance between two points in the Euclidean space. If p = (p1, p2, · · · , pn) and q = (q1, q2, · · · , qn) are two samples in the word vector space, then the Euclidean distance between p and q is given by: d(p, q) = (p1 − q1)2 + (p2 − q2)2 + · · · + (pn − qn)2 (5). In our experiment, the Euclidean space is exactly the word vector space

datasets: 3
Therefore, we wonder whether adversarial texts also have this property. In this evaluation, we generated adversarial texts on all three datasets for LR, CNN, and LSTM models. Then, we evaluated the attack success rate of the generated adversarial texts against other models/platforms

legitimate samples: 500
Before the study, we consulted with the IRB office and this study was approved and we did not collect any other information of participants except for necessary result data. First, we randomly sampled 500 legitimate samples and 500 adversarial samples from IMDB and Kaggle datasets, respectively. Among them, half were generated under whitebox settings and half were generated under black-box setting

annotations from different users: 3
Meanwhile, we also asked them to mark the suspicious words or inappropriate expression in the samples. To avoid labeling bias, we allow each user to annotate at most 20 reviews and collect 3 annotations from different users for each sample. Finally, 3,177 valid annotations from 297 AMT workers were obtained in total

AMT workers: 297
To avoid labeling bias, we allow each user to annotate at most 20 reviews and collect 3 annotations from different users for each sample. Finally, 3,177 valid annotations from 297 AMT workers were obtained in total. After examining the results, we find that 95.5% legitimate

Reference
  • M. Alzantot, Y. Sharma, A. Elgohary, B.-J. Ho, M. Srivastava, and K.W. Chang, “Generating natural language adversarial examples,” arXiv preprint arXiv:1804.07998, 2018.
    Findings
  • M. Barreno, B. Nelson, A. D. Joseph, and J. Tygar, “The security of machine learning,” Machine Learning, vol. 81, no. 2, pp. 121–148, 2010.
    Google ScholarLocate open access versionFindings
  • M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar, “Can machine learning be secure?” in ASIACCS. ACM, 2006, pp. 16–25.
    Google ScholarLocate open access versionFindings
  • Y. Belinkov and Y. Bisk, “Synthetic and natural noise both break neural machine translation,” arXiv preprint arXiv:1711.02173, 2017.
    Findings
  • B. Biggio, G. Fumera, and F. Roli, “Design of robust classifiers for adversarial environments,” in SMC. IEEE, 2011, pp. 977–982.
    Google ScholarLocate open access versionFindings
  • N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in S&P, 2017, pp. 39–57.
    Google ScholarFindings
  • D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar et al., “Universal sentence encoder,” arXiv preprint arXiv:1803.11175, 2018.
    Findings
  • M. Cheng, J. Yi, H. Zhang, P.-Y. Chen, and C.-J. Hsieh, “Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples,” arXiv preprint arXiv:1803.01128, 2018.
    Findings
  • J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, “Hotflip: White-box adversarial examples for nlp,” arXiv preprint arXiv:1712.06751, 2017.
    Findings
  • I. Evtimov, K. Eykholt, E. Fernandes, T. Kohno, B. Li, A. Prakash, A. Rahmati, and D. Song, “Robust physical-world attacks on machine learning models,” arXiv preprint arXiv:1707.08945, 2017.
    Findings
  • J. Gao, J. Lanchantin, M. L. Soffa, and Y. Qi, “Black-box generation of adversarial text sequences to evade deep learning classifiers,” arXiv preprint arXiv:1801.04354, 2018.
    Findings
  • Z. Gong, W. Wang, B. Li, D. Song, and W.-S. Ku, “Adversarial texts with gradient methods,” arXiv preprint arXiv:1801.07175, 2018.
    Findings
  • I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in ICLR, 2015, pp. 1–11.
    Google ScholarFindings
  • H. Hosseini, S. Kannan, B. Zhang, and R. Poovendran, “Deceiving google’s perspective api built for detecting toxic comments,” arXiv preprint arXiv:1702.08138, 2017.
    Findings
  • L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. Tygar, “Adversarial machine learning,” in AISec. ACM, 2011, pp. 43–58.
    Google ScholarLocate open access versionFindings
  • R. Jia and P. Liang, “Adversarial examples for evaluating reading comprehension systems,” in EMNLP, 2017, pp. 2021–2031.
    Google ScholarLocate open access versionFindings
  • Y. Kim, “Convolutional neural networks for sentence classification,” in EMNLP, 2014, pp. 1746–1751.
    Google ScholarLocate open access versionFindings
  • Y. Li, T. Cohn, and T. Baldwin, “Learning robust representations of text,” in EMNLP, 2016, pp. 1979–1985.
    Google ScholarFindings
  • B. Liang, H. Li, M. Su, P. Bian, X. Li, and W. Shi, “Deep text classification can be fooled,” arXiv preprint arXiv:1704.08006, 2017.
    Findings
  • X. Ling, S. Ji, J. Zou, J. Wang, C. Wu, B. Li, and T. Wang, “Deepsec: A uniform platform for security analysis of deep learning model,” in IEEE S&P, 2019.
    Google ScholarLocate open access versionFindings
  • A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning word vectors for sentiment analysis,” in ACL. Portland, Oregon, USA: Association for Computational Linguistics, June 2011, pp. 142–150.
    Google ScholarFindings
  • W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093–1113, 2014.
    Google ScholarLocate open access versionFindings
  • T. Miyato, A. M. Dai, and I. Goodfellow, “Adversarial training methods for semi-supervised text classification,” ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” in CVPR, 2016, pp. 2574–2582.
    Google ScholarFindings
  • A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” in CVPR. IEEE, 2015, pp. 427–436.
    Google ScholarLocate open access versionFindings
  • C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, “Abusive language detection in online user content,” in WWW. International World Wide Web Conferences Steering Committee, 2016, pp. 145–153.
    Google ScholarLocate open access versionFindings
  • B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales,” in ACL. Association for Computational Linguistics, 2005, pp. 115–124.
    Google ScholarLocate open access versionFindings
  • N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in Asia CCS. ACM, 2017, pp. 506–519.
    Google ScholarLocate open access versionFindings
  • N. Papernot, P. McDaniel, A. Swami, and R. Harang, “Crafting adversarial input sequences for recurrent neural networks,” in MILCOM. IEEE, 2016, pp. 49–54.
    Google ScholarLocate open access versionFindings
  • J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation,” in EMNLP, 2014, pp. 1532–1543.
    Google ScholarLocate open access versionFindings
  • G. Rawlinson, “The significance of letter position in word recognition,” IEEE Aerospace and Electronic Systems Magazine, vol. 22, no. 1, pp. 26–27, 2007.
    Google ScholarLocate open access versionFindings
  • M. T. Ribeiro, S. Singh, and C. Guestrin, “Semantically equivalent adversarial rules for debugging nlp models,” in ACL, 2018.
    Google ScholarFindings
  • S. Samanta and S. Mehta, “Towards crafting text adversarial samples,” arXiv preprint arXiv:1707.02812, 2017.
    Findings
  • D. Sculley, G. Wachman, and C. E. Brodley, “Spam filtering using inexact string matching in explicit feature space with on-line linear classifiers.” in TREC, 2006.
    Google ScholarFindings
  • C. E. Shannon, “Communication theory of secrecy systems,” Bell system technical journal, vol. 28, no. 4, pp. 656–715, 1949.
    Google ScholarLocate open access versionFindings
  • C. Szegedy, “Intriguing properties of neural networks,” in ICLR, 2014, pp. 1–10.
    Google ScholarLocate open access versionFindings
  • C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song, “Generating adversarial examples with adversarial networks,” arXiv preprint arXiv:1801.02610, 2018.
    Findings
  • X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in NIPS. Neural information processing systems foundation, 2015, pp. 649–657.
    Google ScholarFindings
  • Y. Zhang and B. Wallace, “A sensitivity analysis of (and practitioners guide to) convolutional neural networks for sentence classification,” in IJCNLP, vol. 1, 2017, pp. 253–263.
    Google ScholarLocate open access versionFindings
  • Z. Zhao, D. Dua, and S. Singh, “Generating natural adversarial examples,” in ICLR, 2018.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科