On the Sample Complexity of Adversarial Multi-Source PAC Learning

Konstantinov Nikola
Konstantinov Nikola
Frantar Elias
Frantar Elias

ICML 2020, 2020.

Cited by: 1|Bibtex|Views4|Links
Keywords:
pac learningDistributed Learningm sampleprobably approximately correctmultiple sourceMore(7+)
Weibo:
We showed that adversarial multi-source probably approximately correct learning is possible for any hypothesis class with the uniform convergence property, and we provided explicit rates for the excess risk

Abstract:

We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or e...More

Code:

Data:

Introduction
  • An important problem of current machine learning research is to make learned systems more trustworthy.
  • Robustness at training time, is represented less prominently, despite being of great importance.
  • One reason might be that learning from a potentially adversarial data source is very hard: a classic result states that when a fixed fraction of the training dataset is adversarially corrupted, successful learning in the PAC sense is not possible anymore (Kearns & Li, 1993).
  • There exists no robust learning algorithm that could overcome the effects of adversarial corruptions in a constant fraction of the training dataset and approach the optimal model, even in the limit of infinite data
Highlights
  • An important problem of current machine learning research is to make learned systems more trustworthy
  • We studied the problem of robust learning from multiple unreliable datasets
  • Rephrasing this task as learning from datasets that might be adversarially corrupted, we introduced the formal problem of adversarial learning from multiple sources, which we studied in the classic probably approximately correct setting
  • We showed that adversarial multi-source probably approximately correct learning is possible for any hypothesis class with the uniform convergence property, and we provided explicit rates for the excess risk (Theorem 1 and Corollaries)
  • The proof is constructive and shows that integrating robustness comes at a minor statistical cost, as our robust learner achieves optimal rates when run on data without manipulations
  • We proved that adversarial probably approximately correct learning from multiple sources is far from trivial
Conclusion
  • Corollary 1 is in sharp contrast with the situation of single dataset PAC robustness.
  • The authors recover the classic convergence rate for learning from N m samples in the non-realizable case
  • This fact is interesting, as the robust learner of Theorem 1 does not need to know the value of α for its operation.
  • The same algorithm will work robustly if the data contains manipulations but without an unnecessary overhead, if all data sources are uncorrupted.The authors studied the problem of robust learning from multiple unreliable datasets.
  • In order to facilitate successful learning with a constant fraction of corrupted sources, the number of samples per source has to grow
Summary
  • Introduction:

    An important problem of current machine learning research is to make learned systems more trustworthy.
  • Robustness at training time, is represented less prominently, despite being of great importance.
  • One reason might be that learning from a potentially adversarial data source is very hard: a classic result states that when a fixed fraction of the training dataset is adversarially corrupted, successful learning in the PAC sense is not possible anymore (Kearns & Li, 1993).
  • There exists no robust learning algorithm that could overcome the effects of adversarial corruptions in a constant fraction of the training dataset and approach the optimal model, even in the limit of infinite data
  • Conclusion:

    Corollary 1 is in sharp contrast with the situation of single dataset PAC robustness.
  • The authors recover the classic convergence rate for learning from N m samples in the non-realizable case
  • This fact is interesting, as the robust learner of Theorem 1 does not need to know the value of α for its operation.
  • The same algorithm will work robustly if the data contains manipulations but without an unnecessary overhead, if all data sources are uncorrupted.The authors studied the problem of robust learning from multiple unreliable datasets.
  • In order to facilitate successful learning with a constant fraction of corrupted sources, the number of samples per source has to grow
Related work
  • To our knowledge, our results are the first that formally characterize the statistical hardness of learning from multiple i.i.d. sources, when a constant fraction of them might be adversarially corrupted. There are a number of conceptually related works, though, which we will discuss for the rest of this section.

    Qiao & Valiant (2018), as well as the follow-up works of Chen et al (2019); Jain & Orlitsky (2019), aim at estimating discrete distributions from multiple batches of data, some of which have been adversarially corrupted. The main difference to our results is the focus on finite data domains and estimating the underlying probability distribution rather than learning a hypothesis.

    Qiao (2018) studies collaborative binary classification: a learning system has access to multiple training datasets and a subset of them can be adversarially corrupted. In this setup, the uncorrupted sources are allowed to have different input distributions, but share a common labelling function. The author proves that it is possible to robustly learn individual hypotheses for each source, but a single shared hypothesis cannot be learned robustly. For the specific case that all data distributions are identical, the setup matches ours, though only for binary classification in the realizable case, and with a different adversarial model.
Reference
  • Alistarh, D., Allen-Zhu, Z., and Li, J. Byzantine stochastic gradient descent. In Conference on Neural Information Processing Systems (NeurIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Awasthi, P., Blum, A., Mansour, Y., et al. Efficient pac learning from the crowd. In Conference on Computational Learning Theory (COLT), 2017.
    Google ScholarLocate open access versionFindings
  • Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J. W. A theory of learning from different domains. Machine Learning, 79(1-2):151–175, 2010.
    Google ScholarLocate open access versionFindings
  • Bhagoji, A. N., Chakraborty, S., Mittal, P., and Calo, S. Analyzing federated learning through an adversarial lens. In International Conference on Machine Learing (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • Blanchard, P., Guerraoui, R., Stainer, J., et al. Machine learning with adversaries: Byzantine tolerant gradient descent. In Conference on Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Bshouty, N. H., Eiron, N., and Kushilevitz, E. Pac learning with nasty noise. Theoretical Computer Science, 288(2): 255–275, 2002.
    Google ScholarLocate open access versionFindings
  • Chen, S., Li, J., and Moitra, A. Efficiently learning structured distributions from untrusted batches. In ACM Symposium on Theory of Computing (STOC), 2019.
    Google ScholarLocate open access versionFindings
  • Chen, Y., Su, L., and Xu, J. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proceedings of the ACM on Measurement and
    Google ScholarLocate open access versionFindings
  • Analysis of Computing Systems (POMACS), 1(2):1–25, 2017.
    Google ScholarLocate open access versionFindings
  • Cohen, J., Rosenfeld, E., and Kolter, Z. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learing (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • Diakonikolas, I., Kamath, G., Kane, D., Li, J., Moitra, A., and Stewart, A. Robust estimators in high-dimensions without the computational intractability. SIAM Journal on Computing, 48(2):742–864, 2019.
    Google ScholarLocate open access versionFindings
  • Feng, J. On fundamental limits of robust learning. arXiv preprint arXiv:1703.10444, 2017.
    Findings
  • Feng, J., Xu, H., and Mannor, S. Distributed robust learning. arXiv preprint arXiv:1409.5937, 2014.
    Findings
  • Fung, C., Yoon, C. J., and Beschastnikh, I. Mitigating sybils in federated learning poisoning. arXiv preprint arXiv:1808.04866, 2018.
    Findings
  • Hendrycks, D. and Gimpel, K. A baseline for detecting misclassifier and out-of-distribution examples in neural networks. In International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Jain, A. and Orlitsky, A. Robust learning of discrete distributions from batches. arXiv preprint arXiv:1911.08532, 2019.
    Findings
  • Kearns, M. and Li, M. Learning in the presence of malicious errors. SIAM Journal on Computing, 1993.
    Google ScholarLocate open access versionFindings
  • Kifer, D., Ben-David, S., and Gehrke, J. Detecting change in data streams. In VLDB, 2004.
    Google ScholarLocate open access versionFindings
  • Konstantinov, N. and Lampert, C. H. Robust learning from untrusted sources. In International Conference on Machine Learing (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • Lee, K., Lee, K., Lee, H., and Shin, J. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Conference on Neural Information Processing Systems (NeurIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Liang, S., Li, Y., and Srikant, R. Enhancing the reliability of out-of-distribution image detection in neural networks. In International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Mahloujifar, S., Mahmoody, M., and Mohammed, A. Universal multi-party poisoning attacks. In International Conference on Machine Learing (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • https://research.googleblog.com/2017/04/federatedlearning-collaborative.html, 2017.
    Findings
  • Mohri, M. and Medina, A. M. New analysis and algorithm for learning with drifting distributions. In International Conference on Algorithmic Learning Theory (ALT), 2012.
    Google ScholarLocate open access versionFindings
  • Mohri, M., Rostamizadeh, A., and Talwalkar, A. Foundations of machine learning. MIT press, 2018.
    Google ScholarFindings
  • Qiao, M. Do outliers ruin collaboration? In International Conference on Machine Learing (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Qiao, M. and Valiant, G. Learning discrete distributions from untrusted batches. In LIPIcs-Leibniz International Proceedings in Informatics, volume 94. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
    Google ScholarLocate open access versionFindings
  • Raghunathan, A., Steinhardt, J., and Liang, P. Certified defenses against adversarial examples. In International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Shalev-Shwartz, S. and Ben-David, S. Understanding Machine Learning: From Theory to Algorithms. Cambridge university press, 2014.
    Google ScholarFindings
  • Sheng, V. S. and Zhang, J. Machine learning with crowdsourcing: A brief summary of the past research and future directions. In AAAI Conference on Artificial Intelligence, 2019.
    Google ScholarLocate open access versionFindings
  • Singh, G., Gehr, T., Mirman, M., Puschel, M., and Vechev, M. Fast and effective robustness certification. In Conference on Neural Information Processing Systems (NeurIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Yin, D., Chen, Y., Ramchandran, K., and Bartlett, P. Byzantine-robust distributed learning: Towards optimal statistical rates. International Conference on Machine Learing (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Yin, D., Chen, Y., Kannan, R., and Bartlett, P. Defending against saddle point attack in Byzantine-robust distributed learning. In International Conference on Machine Learing (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • Proof. We use the standard generalization bound based on Rademacher complexity. Assume that S = {(x1, y1),..., (xn, yn)} ∼ D, then with probability at least 1 − δ over the data (Mohri et al., 2018): sup h∈H
    Google ScholarLocate open access versionFindings
  • Proof. (a) Whenever H is of finite VC-dimension d, there exists a constant C, such that the following generalization bound holds (Bousquet et al., 2004): sup h∈H
    Google ScholarFindings
  • We use a similar proof technique as in the no-free-lunch results in (Bshouty et al., 2002) and in the classic no-free-lunch theorem, e.g. Theorem 3.20 in (Mohri et al., 2018). An overview is as follows. Consider a distribution on X that has support only at two points - the common point x1 and the rare point x2.
    Google ScholarLocate open access versionFindings
  • To prove part (a), we use a similar technique as in the no-free-lunch results in (Bshouty et al., 2002) and in the classic no-free-lunch theorem, e.g. Theorem 3.20 in (Mohri et al., 2018). An overview is as follows. Consider a distribution on X
    Google ScholarLocate open access versionFindings
  • This follows directly from the classic no-free-lunch theorem for binary classifiers in the unrealizable case. Indeed, applying Theorem 3.23 in (Mohri et al., 2018) and setting the adversary to be the identity mapping gives the result.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments