# The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise

NeurIPS, 2020.

EI
Keywords:
computational complexityadversarial examplelow bounddistribution independentExponential Time HypothesisMore(7+)
Weibo:
We studied the computational complexity of adversarially robust learning of halfspaces in the distribution-independent agnostic PAC model

Abstract:

We study the computational complexity of adversarially robust proper learning of halfspaces in the distribution-independent agnostic PAC model, with a focus on $L_p$ perturbations. We give a computationally efficient learning algorithm and a nearly matching computational hardness result for this problem. An interesting implication of ou...More

Code:

Data:

0
Introduction
• One of the main concrete goals in this context has been to develop classifiers that are robust to adversarial examples, i.e., small imperceptible perturbations to the input that can result in erroneous misclassification [BCM+13, SZS+14, GSS15].
• This has led to an explosion of research on designing defenses against adversarial examples and attacks on these defenses.
• The authors study the learnability of halfspaces in this model with respect to
Highlights
• In recent years, the design of reliable machine learning systems for secure-critical applications, including in computer vision and natural language processing, has been a major goal in the field.

One of the main concrete goals in this context has been to develop classifiers that are robust to adversarial examples, i.e., small imperceptible perturbations to the input that can result in erroneous misclassification [BCM+13, SZS+14, GSS15]
• We focus on understanding the computational complexity of adversarially robust classification in the agnostic PAC model [Hau[92], KSS94]
• We studied the computational complexity of adversarially robust learning of halfspaces in the distribution-independent agnostic PAC model
• We provided a simple proper learning algorithm for this problem and a nearly matching computational lower bound
• While proper learners are typically preferable due to their interpretability, the obvious open question is whether significantly faster non-proper learners are possible
Results
• Label Cover [ABSS97, FGKP06, GR09, FGRW12, DKM19]5.
• These reductions use gadgets which are “local” in nature.
• As the authors will explain such “local” reductions cannot work for the purpose.
• It is convenient to think of each sample (x, y) as a linear constraint w, x ≥ 0 when y = +1 and w, x < 0 when y = −1, where the variables are the coordinates w1, .
• For the purpose, the authors want (i) the halfspace w to be in Bd1, i.e., |w1| + · · · + |wd| ≤ 1, and (ii) each of the samples x to lie in Bd∞, i.e., |x1|, . . . , |xd| ≤ 1
Conclusion
• Conclusions and Open Problems

In this work, the authors studied the computational complexity of adversarially robust learning of halfspaces in the distribution-independent agnostic PAC model.
• While proper learners are typically preferable due to their interpretability, the obvious open question is whether significantly faster non-proper learners are possible.
• The authors leave this as an interesting open problem.
• Another direction for future work is to understand the effect of distributional assumptions on the complexity of the problem and to explore the learnability of simple neural networks in this context.
Summary
• ## Introduction:

One of the main concrete goals in this context has been to develop classifiers that are robust to adversarial examples, i.e., small imperceptible perturbations to the input that can result in erroneous misclassification [BCM+13, SZS+14, GSS15].
• This has led to an explosion of research on designing defenses against adversarial examples and attacks on these defenses.
• The authors study the learnability of halfspaces in this model with respect to
• ## Objectives:

For some constants 0 < ν < 1 and α > 1, the goal is to efficiently compute a hypothesis h such that with high probability.
• ## Results:

Label Cover [ABSS97, FGKP06, GR09, FGRW12, DKM19]5.
• These reductions use gadgets which are “local” in nature.
• As the authors will explain such “local” reductions cannot work for the purpose.
• It is convenient to think of each sample (x, y) as a linear constraint w, x ≥ 0 when y = +1 and w, x < 0 when y = −1, where the variables are the coordinates w1, .
• For the purpose, the authors want (i) the halfspace w to be in Bd1, i.e., |w1| + · · · + |wd| ≤ 1, and (ii) each of the samples x to lie in Bd∞, i.e., |x1|, . . . , |xd| ≤ 1
• ## Conclusion:

Conclusions and Open Problems

In this work, the authors studied the computational complexity of adversarially robust learning of halfspaces in the distribution-independent agnostic PAC model.
• While proper learners are typically preferable due to their interpretability, the obvious open question is whether significantly faster non-proper learners are possible.
• The authors leave this as an interesting open problem.
• Another direction for future work is to understand the effect of distributional assumptions on the complexity of the problem and to explore the learnability of simple neural networks in this context.
Related work
• A sequence of recent works [CBM18, SST+18, BLPR19, MHS19] has studied the sample complexity of adversarially robust PAC learning for general concept classes of bounded VC dimension and for halfspaces in particular. [MHS19] established an upper bound on the sample complexity of PAC learning any concept class with finite VC dimension. A common implication of the aforementioned works is that, for some concept classes, the sample complexity of adversarially robust PAC learning is higher than the sample complexity of (standard) PAC learning. For the class of halfspaces, which is the focus of the current paper, the sample complexity of adversarially robust agnostic PAC learning was shown to be essentially the same as that of (standard) agnostic PAC learning [CBM18, MHS19].

Turning to computational aspects, [BLPR19, DNV19] showed that there exist classification tasks that are efficiently learnable in the standard PAC model, but are computationally hard in the adversarially robust setting (under cryptographic assumptions). Notably, the classification problems shown hard are artificial, in the sense that they do not correspond to natural concept classes. [ADV19] shows that adversarially robust proper learning of degree-2 polynomial threshold functions is computationally hard, even in the realizable setting. On the positive side, [ADV19] gives a polynomial-time algorithm for adversarially robust learning of halfspaces under L∞ perturbations, again in the realizable setting. More recently, [MGDS20] generalized this upper bound to a broad class of perturbations, including Lp perturbations. Moreover, [MGDS20] gave an efficient algorithm for learning halfspaces with random classification noise [AL88]. We note that all these algorithms are proper.
Funding
• We note that our algorithm has significantly better dependence on the parameter δ (quantifying the approximation ratio), and better dependence on 1/γ
Reference
• [ABSS97] Sanjeev Arora, László Babai, Jacques Stern, and Z. Sweedyk. The hardness of approximate optima in lattices, codes, and systems of linear equations. J. Comput. Syst. Sci., 54(2):317–331, 1997.
• Pranjal Awasthi, Abhratanu Dutta, and Aravindan Vijayaraghavan. On robustness to adversarial examples and polynomial optimization. In Advances in Neural Information Processing Systems, pages 13737–13747, 2019.
• Dana Angluin and Philip Laird. Learning from noisy examples. Mach. Learn., 2(4):343– 370, 1988.
• Divesh Aggarwal and Noah Stephens-Davidowitz. (Gap/S)ETH hardness of SVP. In STOC, pages 228–238, 2018.
• Zsolt Baranyai. On the factorization of the complete uniform hypergraph. Infinite and Finite Sets, Proc. Coll. Keszthely, 10:91–107, 1975.
• Maria-Florina Balcan and Christopher Berlind. A new perspective on learning linear separators with large lqlp margins. In AISTATS, pages 68–76, 2014.
• [BBE+19] Arnab Bhattacharyya, Édouard Bonnet, László Egri, Suprovat Ghoshal, Karthik C. S., Bingkai Lin, Pasin Manurangsi, and Dániel Marx. Parameterized intractability of even set and shortest vector problem. Electronic Colloquium on Computational Complexity (ECCC), 26:115, 2019.
• [BCM+13] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In ECML PKDD, pages 387–402, 2013.
• Andrew C. Berry. The accuracy of the Gaussian approximation to the sum of independent variates. Transactions of the American Mathematical Society, 49(1):122–136, 1941.
• [BGKM18] Arnab Bhattacharyya, Suprovat Ghoshal, Karthik C. S., and Pasin Manurangsi. Parameterized intractability of even set and shortest vector problem from Gap-ETH. In ICALP, pages 17:1–17:15, 2018.
• [BGS17] Huck Bennett, Alexander Golovnev, and Noah Stephens-Davidowitz. On the quantitative hardness of CVP. In FOCS, pages 13–24, 2017.
• [BLPR19] Sebastien Bubeck, Yin-Tat Lee, Eric Price, and Ilya P. Razenshteyn. Adversarial examples from computational constraints. In ICML, pages 831–840, 2019.
• Peter L. Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002.
• Shai Ben-David and Hans Ulrich Simon. Efficient learning of linear perceptrons. In Advances in Neural Information Processing Systems, pages 189–195, 2000.
• Aharon Birnbaum and Shai Shalev-Shwartz. Learning halfspaces with the zero-one loss: Time-accuracy tradeoffs. In Advances in Neural Information Processing Systems, pages 935–943, 2012.
• [CBM18] Daniel Cullina, Arjun Nitin Bhagoji, and Prateek Mittal. Pac-learning in the presence of adversaries. In Advances in Neural Information Processing Systems, pages 228–239, 2018.
• [CCK+17] Parinya Chalermsook, Marek Cygan, Guy Kortsarz, Bundit Laekhanukit, Pasin Manurangsi, Danupon Nanongkai, and Luca Trevisan. From Gap-ETH to FPTinapproximability: Clique, dominating set, and more. In FOCS, pages 743–754, 2017.
• [CGK+19] Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight FPT approximations for k-median and k-means. In ICALP, pages 42:1–42:14, 2019.
• Yijia Chen and Bingkai Lin. The constant inapproximability of the parameterized dominating set problem. SIAM J. Comput., 48(2):513–533, 2019.
• Zico Colter and Aleksander Madry. Adversarial robustness - theory and practice. NeurIPS 2018 tutorial, available at https://adversarial-ml-tutorial.org/, 2018.
• Irit Dinur. Mildly exponential reduction from gap 3SAT to polynomial-gap label-cover. Electronic Colloquium on Computational Complexity (ECCC), 23:128, 2016.
• [DKM19] Ilias Diakonikolas, Daniel Kane, and Pasin Manurangsi. Nearly tight bounds for robust proper learning of halfspaces with a margin. In Advances in Neural Information Processing Systems, pages 10473–10484, 2019.
• [DM18] Irit Dinur and Pasin Manurangsi. ETH-hardness of approximating 2-CSPs and directed steiner network. In ITCS, pages 36:1–36:20, 2018.
• [DNV19] Akshay Degwekar, Preetum Nakkiran, and Vinod Vaikuntanathan. Computational limitations in robust classification and win-win results. In COLT, pages 994–1028, 2019.
• Irit Dinur and David Steurer. Analytical approach to parallel repetition. In STOC, pages 624–633, 2014.
• Carl-Gustav Esseen. On the Liapunoff limit of error in the theory of probability. Arkiv för matematik, astronomi och fysik, A28:1–19, 1942.
• Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634–652, 1998.
• [FGKP06] Vitaly Feldman, Parikshit Gopalan, Subhash Khot, and Ashok Kumar Ponnuswami. New results for learning noisy parities and halfspaces. In FOCS, pages 563–574, 2006.
• [FGRW12] Vitaly Feldman, Venkatesan Guruswami, Prasad Raghavendra, and Yi Wu. Agnostic learning of monomials by halfspaces is hard. SIAM J. Comput., 41(6):1558–1590, 2012.
• Yoav Freund and Robert Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139, 1997.
• [Gen01a] Claudio Gentile. A new approximate maximal margin classification algorithm. J. Mach. Learn. Res., 2:213–242, 2001.
• [Gen01b] Claudio Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213–242, 2001.
• [Gen03] Claudio Gentile. The robustness of the p-norm algorithms. Mach. Learn., 53(3):265–299, 2003.
• [GLS01] Adam J. Grove, Nick Littlestone, and Dale Schuurmans. General convergence results for linear discriminant updates. Mach. Learn., 43(3):173–210, 2001.
• Venkatesan Guruswami and Prasad Raghavendra. Hardness of learning halfspaces with noise. SIAM J. Comput., 39(2):742–765, 2009.
• [GSS15] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
• Johan Håstad. Clique is hard to approximate within n1−ǫ. In FOCS, pages 627–636, 1996.
• [Hås01] Johan Håstad. Some optimal inapproximability results. J. ACM, 48(4):798–859, 2001.
• [Hau92] David Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100:78–150, 1992.
• Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-SAT. J. Comput. Syst. Sci., 62(2):367–375, 2001.
• Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Comput. Syst. Sci., 63(4):512–530, 2001.
• Vishesh Jain, Frederic Koehler, and Andrej Risteski. Mean-field approximation, convex hierarchies, and the optimality of correlation rounding: a unified perspective. In STOC, pages 1226–1236, 2019.
• [KLM19] Karthik C. S., Bundit Laekhanukit, and Pasin Manurangsi. On the parameterized complexity of approximating dominating set. J. ACM, 66(5):33:1–33:38, 2019.
• Vladimir Koltchinskii and Dmitry Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist., 30(1):1–50, 02 2002.
• [KSS94] Michael Kearns, Robert Schapire, and Linda Sellie. Toward Efficient Agnostic Learning. Machine Learning, 17(2/3):115–141, 1994.
• Sham M. Kakade, Karthik Sridharan, and Ambuj Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In Advances in Neural Information Processing Systems, pages 793–800, 2008.
• Bingkai Lin. A simple gap-producing reduction for the parameterized set cover problem. In ICALP, pages 81:1–81:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
• Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm. Machine Learning, 2(4):285–318, 1987.
• [LMS11] Daniel Lokshtanov, Dániel Marx, and Saket Saurabh. Lower bounds based on the exponential time hypothesis. Bulletin of the EATCS, 105:41–72, 2011.
• Phil Long and Rocco Servedio. Learning large-margin halfspaces with more malicious noise. Advances in Neural Information Processing Systems, 2011.
• Pasin Manurangsi. Tight running time lower bounds for strong inapproximability of maximum k -coverage, unique set cover and related problems (via t-wise agreement testing theorem). In SODA, pages 62–81, 2020.
• [Mar13] Dániel Marx. Completely inapproximable monotone and antimonotone parameterized problems. J. Comput. Syst. Sci., 79(1):144–151, 2013.
• [MGDS20] Omar Montasser, Surbhi Goel, Ilias Diakonikolas, and Nathan Srebro. Efficiently learning adversarially robust halfspaces with noise. CoRR, abs/2005.07652, 2020.
• [MHS19] Omar Montasser, Steve Hanneke, and Nathan Srebro. VC classes are adversarially robustly learnable, but only improperly. In COLT, pages 2512–2530, 2019.
• Dana Moshkovitz and Ran Raz. Two-query PCP with subconstant error. J. ACM, 57(5):29:1–29:29, 2010.
• Pasin Manurangsi and Prasad Raghavendra. A birthday repetition theorem and complexity of approximating dense CSPs. In ICALP, pages 78:1–78:15, 2017.
• [Raz98] Ran Raz. A parallel repetition theorem. SIAM J. Comput., 27(3):763–803, 1998.
• Frank Rosenblatt. The Perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958.
• Shai Shalev-Shwartz, Ohad Shamir, and Karthik Sridharan. Agnostically learning halfspaces with margin errors. In Technical report, Toyota Technological Institute, 2009.
• Shai Shalev-Shwartz, Ohad Shamir, and Karthik Sridharan. Learning kernel-based halfspaces with the zero-one loss. In COLT, pages 441–450, 2010.
• [SST+18] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, pages 5019–5031, 2018.
• [SZS+14] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014.
• [Vap98] Vladimir Vapnik. Statistical Learning Theory. Wiley-Interscience, New York, 1998.