# On the Sample Complexity of Adversarial Multi-Source PAC Learning

ICML 2020, 2020.

Keywords:

Weibo:

Abstract:

We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or e...More

Code:

Data:

Introduction

- An important problem of current machine learning research is to make learned systems more trustworthy.
- Robustness at training time, is represented less prominently, despite being of great importance.
- One reason might be that learning from a potentially adversarial data source is very hard: a classic result states that when a fixed fraction of the training dataset is adversarially corrupted, successful learning in the PAC sense is not possible anymore (Kearns & Li, 1993).
- There exists no robust learning algorithm that could overcome the effects of adversarial corruptions in a constant fraction of the training dataset and approach the optimal model, even in the limit of infinite data

Highlights

- An important problem of current machine learning research is to make learned systems more trustworthy
- We studied the problem of robust learning from multiple unreliable datasets
- Rephrasing this task as learning from datasets that might be adversarially corrupted, we introduced the formal problem of adversarial learning from multiple sources, which we studied in the classic probably approximately correct setting
- We showed that adversarial multi-source probably approximately correct learning is possible for any hypothesis class with the uniform convergence property, and we provided explicit rates for the excess risk (Theorem 1 and Corollaries)
- The proof is constructive and shows that integrating robustness comes at a minor statistical cost, as our robust learner achieves optimal rates when run on data without manipulations
- We proved that adversarial probably approximately correct learning from multiple sources is far from trivial

Conclusion

- Corollary 1 is in sharp contrast with the situation of single dataset PAC robustness.
- The authors recover the classic convergence rate for learning from N m samples in the non-realizable case
- This fact is interesting, as the robust learner of Theorem 1 does not need to know the value of α for its operation.
- The same algorithm will work robustly if the data contains manipulations but without an unnecessary overhead, if all data sources are uncorrupted.The authors studied the problem of robust learning from multiple unreliable datasets.
- In order to facilitate successful learning with a constant fraction of corrupted sources, the number of samples per source has to grow

Summary

## Introduction:

An important problem of current machine learning research is to make learned systems more trustworthy.- Robustness at training time, is represented less prominently, despite being of great importance.
- One reason might be that learning from a potentially adversarial data source is very hard: a classic result states that when a fixed fraction of the training dataset is adversarially corrupted, successful learning in the PAC sense is not possible anymore (Kearns & Li, 1993).
- There exists no robust learning algorithm that could overcome the effects of adversarial corruptions in a constant fraction of the training dataset and approach the optimal model, even in the limit of infinite data
## Conclusion:

Corollary 1 is in sharp contrast with the situation of single dataset PAC robustness.- The authors recover the classic convergence rate for learning from N m samples in the non-realizable case
- This fact is interesting, as the robust learner of Theorem 1 does not need to know the value of α for its operation.
- The same algorithm will work robustly if the data contains manipulations but without an unnecessary overhead, if all data sources are uncorrupted.The authors studied the problem of robust learning from multiple unreliable datasets.
- In order to facilitate successful learning with a constant fraction of corrupted sources, the number of samples per source has to grow

Related work

- To our knowledge, our results are the first that formally characterize the statistical hardness of learning from multiple i.i.d. sources, when a constant fraction of them might be adversarially corrupted. There are a number of conceptually related works, though, which we will discuss for the rest of this section.

Qiao & Valiant (2018), as well as the follow-up works of Chen et al (2019); Jain & Orlitsky (2019), aim at estimating discrete distributions from multiple batches of data, some of which have been adversarially corrupted. The main difference to our results is the focus on finite data domains and estimating the underlying probability distribution rather than learning a hypothesis.

Qiao (2018) studies collaborative binary classification: a learning system has access to multiple training datasets and a subset of them can be adversarially corrupted. In this setup, the uncorrupted sources are allowed to have different input distributions, but share a common labelling function. The author proves that it is possible to robustly learn individual hypotheses for each source, but a single shared hypothesis cannot be learned robustly. For the specific case that all data distributions are identical, the setup matches ours, though only for binary classification in the realizable case, and with a different adversarial model.

Reference

- Alistarh, D., Allen-Zhu, Z., and Li, J. Byzantine stochastic gradient descent. In Conference on Neural Information Processing Systems (NeurIPS), 2018.
- Awasthi, P., Blum, A., Mansour, Y., et al. Efficient pac learning from the crowd. In Conference on Computational Learning Theory (COLT), 2017.
- Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J. W. A theory of learning from different domains. Machine Learning, 79(1-2):151–175, 2010.
- Bhagoji, A. N., Chakraborty, S., Mittal, P., and Calo, S. Analyzing federated learning through an adversarial lens. In International Conference on Machine Learing (ICML), 2019.
- Blanchard, P., Guerraoui, R., Stainer, J., et al. Machine learning with adversaries: Byzantine tolerant gradient descent. In Conference on Neural Information Processing Systems (NIPS), 2017.
- Bshouty, N. H., Eiron, N., and Kushilevitz, E. Pac learning with nasty noise. Theoretical Computer Science, 288(2): 255–275, 2002.
- Chen, S., Li, J., and Moitra, A. Efficiently learning structured distributions from untrusted batches. In ACM Symposium on Theory of Computing (STOC), 2019.
- Chen, Y., Su, L., and Xu, J. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proceedings of the ACM on Measurement and
- Analysis of Computing Systems (POMACS), 1(2):1–25, 2017.
- Cohen, J., Rosenfeld, E., and Kolter, Z. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learing (ICML), 2019.
- Diakonikolas, I., Kamath, G., Kane, D., Li, J., Moitra, A., and Stewart, A. Robust estimators in high-dimensions without the computational intractability. SIAM Journal on Computing, 48(2):742–864, 2019.
- Feng, J. On fundamental limits of robust learning. arXiv preprint arXiv:1703.10444, 2017.
- Feng, J., Xu, H., and Mannor, S. Distributed robust learning. arXiv preprint arXiv:1409.5937, 2014.
- Fung, C., Yoon, C. J., and Beschastnikh, I. Mitigating sybils in federated learning poisoning. arXiv preprint arXiv:1808.04866, 2018.
- Hendrycks, D. and Gimpel, K. A baseline for detecting misclassifier and out-of-distribution examples in neural networks. In International Conference on Learning Representations (ICLR), 2017.
- Jain, A. and Orlitsky, A. Robust learning of discrete distributions from batches. arXiv preprint arXiv:1911.08532, 2019.
- Kearns, M. and Li, M. Learning in the presence of malicious errors. SIAM Journal on Computing, 1993.
- Kifer, D., Ben-David, S., and Gehrke, J. Detecting change in data streams. In VLDB, 2004.
- Konstantinov, N. and Lampert, C. H. Robust learning from untrusted sources. In International Conference on Machine Learing (ICML), 2019.
- Lee, K., Lee, K., Lee, H., and Shin, J. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Conference on Neural Information Processing Systems (NeurIPS), 2018.
- Liang, S., Li, Y., and Srikant, R. Enhancing the reliability of out-of-distribution image detection in neural networks. In International Conference on Learning Representations (ICLR), 2018.
- Mahloujifar, S., Mahmoody, M., and Mohammed, A. Universal multi-party poisoning attacks. In International Conference on Machine Learing (ICML), 2019.
- https://research.googleblog.com/2017/04/federatedlearning-collaborative.html, 2017.
- Mohri, M. and Medina, A. M. New analysis and algorithm for learning with drifting distributions. In International Conference on Algorithmic Learning Theory (ALT), 2012.
- Mohri, M., Rostamizadeh, A., and Talwalkar, A. Foundations of machine learning. MIT press, 2018.
- Qiao, M. Do outliers ruin collaboration? In International Conference on Machine Learing (ICML), 2018.
- Qiao, M. and Valiant, G. Learning discrete distributions from untrusted batches. In LIPIcs-Leibniz International Proceedings in Informatics, volume 94. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
- Raghunathan, A., Steinhardt, J., and Liang, P. Certified defenses against adversarial examples. In International Conference on Learning Representations (ICLR), 2018.
- Shalev-Shwartz, S. and Ben-David, S. Understanding Machine Learning: From Theory to Algorithms. Cambridge university press, 2014.
- Sheng, V. S. and Zhang, J. Machine learning with crowdsourcing: A brief summary of the past research and future directions. In AAAI Conference on Artificial Intelligence, 2019.
- Singh, G., Gehr, T., Mirman, M., Puschel, M., and Vechev, M. Fast and effective robustness certification. In Conference on Neural Information Processing Systems (NeurIPS), 2018.
- Yin, D., Chen, Y., Ramchandran, K., and Bartlett, P. Byzantine-robust distributed learning: Towards optimal statistical rates. International Conference on Machine Learing (ICML), 2018.
- Yin, D., Chen, Y., Kannan, R., and Bartlett, P. Defending against saddle point attack in Byzantine-robust distributed learning. In International Conference on Machine Learing (ICML), 2019.
- Proof. We use the standard generalization bound based on Rademacher complexity. Assume that S = {(x1, y1),..., (xn, yn)} ∼ D, then with probability at least 1 − δ over the data (Mohri et al., 2018): sup h∈H
- Proof. (a) Whenever H is of finite VC-dimension d, there exists a constant C, such that the following generalization bound holds (Bousquet et al., 2004): sup h∈H
- We use a similar proof technique as in the no-free-lunch results in (Bshouty et al., 2002) and in the classic no-free-lunch theorem, e.g. Theorem 3.20 in (Mohri et al., 2018). An overview is as follows. Consider a distribution on X that has support only at two points - the common point x1 and the rare point x2.
- To prove part (a), we use a similar technique as in the no-free-lunch results in (Bshouty et al., 2002) and in the classic no-free-lunch theorem, e.g. Theorem 3.20 in (Mohri et al., 2018). An overview is as follows. Consider a distribution on X
- This follows directly from the classic no-free-lunch theorem for binary classifiers in the unrealizable case. Indeed, applying Theorem 3.23 in (Mohri et al., 2018) and setting the adversary to be the identity mapping gives the result.

Tags

Comments