The Pitfalls of Simplicity Bias in Neural Networks

NIPS 2020, 2020.

Cited by: 0|Bibtex|Views68|Links
EI
Keywords:
complex featuremultiple simple featuresimple featureadversarial exampleextreme sbMore(9+)
Weibo:
Given the important implications of Simplicity Bias, we hope these datasets serve as a useful testbed for devising better training procedures

Abstract:

Several works have proposed Simplicity Bias (SB)---the tendency of standard training procedures such as Stochastic Gradient Descent (SGD) to find simple models---to justify why neural networks generalize well [Arpit et al. 2017, Nakkiran et al. 2019, Valle-Perez et al. 2019]. However, the precise notion of simplicity remains vague. Furt...More
0
Introduction
  • The surprisingly good generalization ability of neural networks, despite their high capacity to fit even randomly labeled data [78], has been a subject of intense study.
  • The authors prove that NNs trained with standard mini-batch gradient descent (GD) on the LSN dataset provably learns a classifier that exclusively relies on the “simple" linear coordinate, exhibiting simplicity bias at the cost of margin.
Highlights
  • The surprisingly good generalization ability of neural networks, despite their high capacity to fit even randomly labeled data [78], has been a subject of intense study
  • Maximum-margin classifiers are inherently robust to perturbations of data at prediction time, and this implication is at odds with concrete evidence that neural networks, in practice, are brittle to adversarial examples [66] and distribution shifts [50, 55, 42, 61]
  • Given the important implications of Simplicity Bias (SB), we hope these datasets serve as a useful testbed for devising better training procedures
  • We first consistently observe that Stochastic Gradient Descent (SGD)-trained models trained on LMS-5 and MS-(5,7) datasets exhibit extreme SB: they exclusively rely on the simplest feature S and remain invariant to all complex features Sc
  • Through theoretical analysis and extensive experiments on synthetic and image-based datasets, we (a) establish that SB is extreme in nature across model architectures and datasets and (b) show that extreme SB can result in poor OOD performance and adversarial vulnerability, even when all simple and complex features have equal predictive power
  • We investigated Simplicity Bias (SB) in SGD-trained neural networks (NNs) using synthetic and image-based datasets that (a) incorporate a precise notion of feature simplicity, (b) are amenable to theoretical analysis and (c) capture subtleties of trained NNs in practice
Results
  • The authors first consistently observe that SGD-trained models trained on LMS-5 and MS-(5,7) datasets exhibit extreme SB: they exclusively rely on the simplest feature S and remain invariant to all complex features Sc. Using S-randomized & Sc-randomized metrics summarized in Table 1, the authors first establish extreme SB on fully-connected (FCN), convolutional (CNN) & sequential (GRU [15]) models.
  • The authors use S-randomized and Sc-randomized metrics to establish that models trained on synthetic and image-based datasets exhibit extreme SB: If all features have full predictive power, NNs rely exclusively on the simplest feature S and remain invariant to all complex features Sc. The authors further validate the results on extreme SB across model architectures, activation functions, optimizers and regularization methods such as 2 regularization and dropout in Appendix C.
  • Poor OOD performance: Given that neural networks tend to heavily rely on spurious features [43, 50], state-of-the-art accuracies on large and diverse validation sets provide a false sense of security; even benign distributional changes to the data during prediction time can drastically degrade or even nullify model performance.
  • Randomizing all complex features Sc—5-slabs in LMS-5, 7-slabs in MS-(5,7), CIFAR block in MNIST-CIFAR—has negligible effect on the trained neural networks— Sc-randomized and original logits essentially overlap—even though Sc and S have equal predictive power This further implies that approaches [28, 38] that aim to detect distribution shifts based on model outputs such as logits or softmax probabilities may themselves fail due to extreme SB.
  • Through theoretical analysis and extensive experiments on synthetic and image-based datasets, we (a) establish that SB is extreme in nature across model architectures and datasets and (b) show that extreme SB can result in poor OOD performance and adversarial vulnerability, even when all simple and complex features have equal predictive power.
  • In Appendix D, the authors show that (a) this phenomenon holds on other datasets, (b) increasing learning rate does not improve generalization onLMS-7 data, and (c) deeper models exhibit stronger bias towards noisy-but-simple features.
Conclusion
  • When datasets have multiple simple features (e.g., multiple linear coordinates in LMS-5 or multiple 5-Slab coordinates in MS-(5,7)), ensembles of independently trained models mitigate SB to some extent by aggregating predictions based on multiple simple features.
  • Given the important implications of SB, the authors hope these datasets serve as a useful testbed for devising better training procedures
Summary
  • The surprisingly good generalization ability of neural networks, despite their high capacity to fit even randomly labeled data [78], has been a subject of intense study.
  • The authors prove that NNs trained with standard mini-batch gradient descent (GD) on the LSN dataset provably learns a classifier that exclusively relies on the “simple" linear coordinate, exhibiting simplicity bias at the cost of margin.
  • The authors first consistently observe that SGD-trained models trained on LMS-5 and MS-(5,7) datasets exhibit extreme SB: they exclusively rely on the simplest feature S and remain invariant to all complex features Sc. Using S-randomized & Sc-randomized metrics summarized in Table 1, the authors first establish extreme SB on fully-connected (FCN), convolutional (CNN) & sequential (GRU [15]) models.
  • The authors use S-randomized and Sc-randomized metrics to establish that models trained on synthetic and image-based datasets exhibit extreme SB: If all features have full predictive power, NNs rely exclusively on the simplest feature S and remain invariant to all complex features Sc. The authors further validate the results on extreme SB across model architectures, activation functions, optimizers and regularization methods such as 2 regularization and dropout in Appendix C.
  • Poor OOD performance: Given that neural networks tend to heavily rely on spurious features [43, 50], state-of-the-art accuracies on large and diverse validation sets provide a false sense of security; even benign distributional changes to the data during prediction time can drastically degrade or even nullify model performance.
  • Randomizing all complex features Sc—5-slabs in LMS-5, 7-slabs in MS-(5,7), CIFAR block in MNIST-CIFAR—has negligible effect on the trained neural networks— Sc-randomized and original logits essentially overlap—even though Sc and S have equal predictive power This further implies that approaches [28, 38] that aim to detect distribution shifts based on model outputs such as logits or softmax probabilities may themselves fail due to extreme SB.
  • Through theoretical analysis and extensive experiments on synthetic and image-based datasets, we (a) establish that SB is extreme in nature across model architectures and datasets and (b) show that extreme SB can result in poor OOD performance and adversarial vulnerability, even when all simple and complex features have equal predictive power.
  • In Appendix D, the authors show that (a) this phenomenon holds on other datasets, (b) increasing learning rate does not improve generalization onLMS-7 data, and (c) deeper models exhibit stronger bias towards noisy-but-simple features.
  • When datasets have multiple simple features (e.g., multiple linear coordinates in LMS-5 or multiple 5-Slab coordinates in MS-(5,7)), ensembles of independently trained models mitigate SB to some extent by aggregating predictions based on multiple simple features.
  • Given the important implications of SB, the authors hope these datasets serve as a useful testbed for devising better training procedures
Tables
  • Table1: If the {S, Sc }-randomized metrics of a model behave as above, then that model relies exclusively on S and is invariant to Sc
  • Table2: Extreme SB can hurt generalization: FCNs of varying depth and width trained onLMS-7 data with SGD and Adam [<a class="ref-link" id="c36" href="#r36">36</a>] obtain approximately 100% train accuracy but at most 90% test accuracy
  • Table3: Three MNIST-CIFAR datasets. We use MNIST-CIFAR:A in the paper. In MNIST-CIFAR:B, we use different MNIST classes: digits 1 and 4. In MNIST-CIFAR:C, we use different CIFAR10 classes: airplane and ship. Our results in Section 4 hold on all three MNIST-CIFAR datasets
  • Table4: Extreme SB across models trained on synthetic and image-based datasets show that all models exclusively rely on the simplest feature S and remain completely invariant to all complex features Sc
  • Table5: Extreme SB in three MNIST-CIFAR datasets) Standard and randomized AUCs of four state-of-the-art CNNs trained on three MNIST-CIFAR datasets. The AUC values collectively indicate that all models exclusively rely on the MNIST block
  • Table6: Effect of activation function and optimizers) (100, 2)-FCNs with multiple activation functions—ReLU, Leaky ReLU [<a class="ref-link" id="c39" href="#r39">39</a>], PReLU [<a class="ref-link" id="c27" href="#r27">27</a>], and Tanh—trained on LMS-5 data using common first-order optimization methods—SGD, Adam [<a class="ref-link" id="c36" href="#r36">36</a>], and RMSProp [<a class="ref-link" id="c67" href="#r67">67</a>]—exhibit extreme SB
  • Table7: Dropout and 2 regularization have no effect on extreme SB of FCNs trained on LMS-7 datasets. The standard and {S,Sc }-randomized AUC values of (100,1)-FCNs and (100,2)-FCNs collectively indicate that the models still exclusively latch on to S (linear block) and remain invariant to Sc (7-slab blocks)
  • Table8: Effect of 2 regularization) Increasing the 2 regularization parameter λ from 10−6 to 10−2 reduces the extent to which FCNs overfit to the noisy linear component. However, increasing λ does not make FCNs learn the 7-slab components and obtain 100% test accuracy—all models continue to overfit to the noisy linear component and obtain 90% test accuracy
  • Table9: Generalization onMS-(5,7) data) FCNs have test accuracy ≈ 90% onMS-(5,7) data. In contrast, FCNs obtain 100% test accuracy on MS-(5,7) data (i.e. without noisy 5-slab block is removed) with the same sample size
  • Table10: Adversarial training on MNIST-CIFAR: The table above presents standard, -robust and CIFAR10-randomized accuracies for MobileNetV2, DenseNet121 and ResNet50 that are training use standard SGD and adversarial training. While adversarial training significantly improves ε-robust accuracy, it does not encourage models to learn complex features (CIFAR10 block in this case). The CIFAR10-randomized accuracies indicate that aadversarially trained models do not mitigate extreme SB, as they exclusively rely on the MNIST block
Download tables as Excel
Related work
  • Given space constraints, we only discuss directly related work and defer the rest to Appendix A.

    Out-of-Distribution (OOD) performance: Several works demonstrate that NNs tend to learn spurious features & low-level statistical patterns rather than semantic features & high-level abstractions, resulting in poor OOD performance [34, 20, 43, 50]. This phenomenon has been exploited to design backdoor attacks against NNs [6, 12] as well. Recent works [72, 71] that encourage models to learn higher-level features improve OOD performance, but require domain-specific knowledge to penalize reliance on spurious features such as image texture [20] and annotation artifacts [25] in vision & language tasks. Learning robust representations without domain knowledge, however, necessitates formalizing the notion of features and feature reliance; our work takes a step in this direction.

    Adversarial robustness: Neural networks exhibit vulnerability to small adversarial perturbations [66]. Standard approaches to mitigate this issue—adversarial training [22, 40] and ensembles [64, 51, 35]—have had limited success on large-scale datasets. Consequently, several works have investigated reasons underlying the existence of adversarial examples: [22] suggests local linearity of trained NNs, [58] indicates insufficient data, [59] suggests inevitability in high dimensions, [9] suggests computational barriers, [16] proposes limitations of neural network architectures, and [31] proposes the presence of non-robust features. Additionally, Jacobsen et al [32] show that NNs exhibit invariance to large label-relevant perturbations. Prior works have also demonstrated the existence of universal adversarial perturbations (UAPs) that are agnostic to model and data [52, 45, 68].
Reference
  • Devansh Arpit, Stanisław Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, et al. A closer look at memorization in deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 233–242. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.
    Findings
  • Peter L Bartlett. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE transactions on Information Theory, 44(2):525–536, 1998.
    Google ScholarLocate open access versionFindings
  • Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, pages 6240–6249, 2017.
    Google ScholarLocate open access versionFindings
  • Anand Bhattad, Min Jin Chong, Kaizhao Liang, Bo Li, and David Forsyth. Unrestricted adversarial examples via semantic manipulation. In International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Battista Biggio, Blaine Nelson, and Pavel Laskov. Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389, 2012.
    Findings
  • Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248, 2017.
    Findings
  • Alon Brutzkus, Amir Globerson, Eran Malach, and Shai Shalev-Shwartz. Sgd learns overparameterized networks that provably generalize on linearly separable data. arXiv preprint arXiv:1710.10174, 2017.
    Findings
  • Sébastien Bubeck, Eric Price, and Ilya Razenshteyn. Adversarial examples from computational constraints. arXiv preprint arXiv:1805.10204, 2018.
    Findings
  • Jacob Buckman, Aurko Roy, Colin Raffel, and Ian Goodfellow. Thermometer encoding: One hot way to resist adversarial examples. 2018.
    Google ScholarFindings
  • Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arxiv e-prints, page. arXiv preprint arXiv:1608.04644, 2, 2016.
    Findings
  • Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
    Findings
  • Lenaic Chizat and Francis Bach. Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. arXiv preprint arXiv:2002.04486, 2020.
    Findings
  • Lenaic Chizat, Edouard Oyallon, and Francis Bach. On lazy training in differentiable programming. In Advances in Neural Information Processing Systems, pages 2933–2943, 2019.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
    Findings
  • Akshay Degwekar, Preetum Nakkiran, and Vinod Vaikuntanathan. Computational limitations in robust classification and win-win results. arXiv preprint arXiv:1902.01086, 2019.
    Findings
  • Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Xiaolin Hu, J Li, and J Zhu. Boosting adversarial attacks with momentum. arxiv preprint. arXiv preprint arXiv: 1710.06081, 2017.
    Findings
  • Gintare Karolina Dziugaite and Daniel M Roy. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008, 2017.
    Findings
  • Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. Exploring the landscape of spatial robustness. arXiv preprint arXiv:1712.02779, 2017.
    Findings
  • Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
    Findings
  • Noah Golowich, Alexander Rakhlin, and Ohad Shamir. Size-independent sample complexity of neural networks. arXiv preprint arXiv:1712.06541, 2017.
    Findings
  • Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
    Findings
  • Suriya Gunasekar, Jason D Lee, Daniel Soudry, and Nati Srebro. Implicit bias of gradient descent on linear convolutional networks. In Advances in Neural Information Processing Systems, pages 9461–9471, 2018.
    Google ScholarLocate open access versionFindings
  • Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1321–1330. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R Bowman, and Noah A Smith. Annotation artifacts in natural language inference data. arXiv preprint arXiv:1803.02324, 2018.
    Findings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. corr abs/1512.03385 (2015), 2015.
    Findings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
    Findings
  • Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. arXiv preprint arXiv:1907.07174, 2019.
    Findings
  • Gao Huang, Zhuang Liu, and Kilian Q Weinberger. Densely connected convolutional networks. corr abs/1608.06993 (2016). arXiv preprint arXiv:1608.06993, 2016.
    Findings
  • Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Adversarial examples are not bugs, they are features. In Advances in Neural Information Processing Systems, pages 125–136, 2019.
    Google ScholarLocate open access versionFindings
  • Joern-Henrik Jacobsen, Jens Behrmann, Richard Zemel, and Matthias Bethge. Excessive invariance causes adversarial vulnerability. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Ziwei Ji and Matus Telgarsky. The implicit bias of gradient descent on nonseparable data. In Conference on Learning Theory, pages 1772–1798, 2019.
    Google ScholarLocate open access versionFindings
  • Jason Jo and Yoshua Bengio. Measuring the tendency of cnns to learn surface statistical regularities. arXiv preprint arXiv:1711.11561, 2017.
    Findings
  • Sanjay Kariyappa and Moinuddin K Qureshi. Improving adversarial robustness of ensembles with diversity training. arXiv preprint arXiv:1901.09981, 2019.
    Findings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. arxiv e-prints, page. arXiv preprint arXiv:1612.01474, 5, 2016.
    Findings
  • Shiyu Liang, Yixuan Li, and Rayadurgam Srikant. Enhancing the reliability of out-ofdistribution image detection in neural networks. arXiv preprint arXiv:1706.02690, 2017.
    Findings
  • Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, page 3, 2013.
    Google ScholarLocate open access versionFindings
  • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
    Findings
  • Karttikeya Mangalam and Vinay Uday Prabhu. Do deep neural networks learn shallow learnable examples first? 2019.
    Google ScholarFindings
  • Tom McCoy, Ellie Pavlick, and Tal Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448, 2019.
    Google ScholarLocate open access versionFindings
  • Tom McCoy, Ellie Pavlick, and Tal Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448, Florence, Italy, July 2019. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Song Mei, Andrea Montanari, and Phan-Minh Nguyen. A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences, 115(33):E7665– E7671, 2018.
    Google ScholarLocate open access versionFindings
  • Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1765–1773, 2017.
    Google ScholarLocate open access versionFindings
  • Vaishnavh Nagarajan and J Zico Kolter. Uniform convergence may be unable to explain generalization in deep learning. In Advances in Neural Information Processing Systems, pages 11611–11622, 2019.
    Google ScholarLocate open access versionFindings
  • Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Benjamin L Edelman, Fred Zhang, and Boaz Barak. Sgd on neural networks learns functions of increasing complexity. arXiv preprint arXiv:1905.11604, 2019.
    Findings
  • Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro. A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1707.09564, 2017.
    Findings
  • Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 427–436, 2015.
    Google ScholarLocate open access versionFindings
  • Luke Oakden-Rayner, Jared Dunnmon, Gustavo Carneiro, and Christopher Ré. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. arXiv preprint arXiv:1909.12475, 2019.
    Findings
  • Tianyu Pang, Kun Xu, Chao Du, Ning Chen, and Jun Zhu. Improving adversarial robustness via promoting ensemble diversity. arXiv preprint arXiv:1901.08846, 2019.
    Findings
  • Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.
    Findings
  • Nicolas Papernot, Patrick D McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. corr abs/1511.04508 (2015). In 37th IEEE Symposium on Security and Privacy, 2015.
    Findings
  • Jie Ren, Peter J Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark Depristo, Joshua Dillon, and Balaji Lakshminarayanan. Likelihood ratios for out-of-distribution detection. In Advances in Neural Information Processing Systems, pages 14680–14691, 2019.
    Google ScholarLocate open access versionFindings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
    Google ScholarLocate open access versionFindings
  • Kevin Roth, Yannic Kilcher, and Thomas Hofmann. The odds are odd: A statistical test for detecting adversarial examples. arXiv preprint arXiv:1902.04818, 2019.
    Findings
  • Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018.
    Google ScholarLocate open access versionFindings
  • Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, pages 5014–5026, 2018.
    Google ScholarLocate open access versionFindings
  • Ali Shafahi, W Ronny Huang, Christoph Studer, Soheil Feizi, and Tom Goldstein. Are adversarial examples inevitable? arXiv preprint arXiv:1809.02104, 2018.
    Findings
  • Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! In Advances in Neural Information Processing Systems, pages 3353–3364, 2019.
    Google ScholarLocate open access versionFindings
  • Becks Simpson, Francis Dutil, Yoshua Bengio, and Joseph Paul Cohen. Gradmask: reduce overfitting by regularizing saliency. arXiv preprint arXiv:1904.07478, 2019.
    Findings
  • Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro. The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1):2822–2878, 2018.
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava. Improving neural networks with dropout. University of Toronto, 182(566):7, 2013.
    Google ScholarFindings
  • Thilo Strauss, Markus Hanselmann, Andrej Junginger, and Holger Ulmer. Ensemble methods as a defense to adversarial perturbations against deep neural networks. arXiv preprint arXiv:1709.03423, 2017.
    Findings
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
    Findings
  • Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2):26– 31, 2012.
    Google ScholarLocate open access versionFindings
  • Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.
    Findings
  • Guillermo Valle-Perez, Chico Q. Camargo, and Ard A. Louis. Deep learning generalizes because the parameter-function map is biased towards simple functions. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
    Google ScholarFindings
  • Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning robust global representations by penalizing local predictive power. In Advances in Neural Information Processing Systems, pages 10506–10518, 2019.
    Google ScholarLocate open access versionFindings
  • Haohan Wang, Zexue He, and Eric P. Xing. Learning robust representations by projecting superficial statistics out. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Eric Wong, Leslie Rice, and J Zico Kolter. Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994, 2020.
    Findings
  • Blake Woodworth, Suriya Gunasekar, Jason D Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, and Nathan Srebro. Kernel and rich regimes in overparametrized models. arXiv preprint arXiv:2002.09277, 2020.
    Findings
  • Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 501–509, 2019.
    Google ScholarLocate open access versionFindings
  • Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.
    Findings
  • Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. Efficient defenses against adversarial attacks. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 39–49, 2017.
    Google ScholarLocate open access versionFindings
  • C Zhang, S Bengio, M Hardt, B Recht, and O Vinyals. Understanding deep learning requires rethinking generalization. In Int Conf. on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I Jordan. Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573, 2019.
    Findings
Your rating :
0

 

Tags
Comments