# The Power of Comparisons for Actively Learning Linear Classifiers

NIPS 2020, 2020.

EI

Weibo:

Abstract:

In the world of big data, large but costly to label datasets dominate many fields. Active learning, an unsupervised alternative to the standard PAC-learning model, was introduced to explore whether adaptive labeling could learn concepts with exponentially fewer labeled samples. While previous results show that active learning performs n...More

Code:

Data:

Introduction

- The availability of big data and the high cost of labeling has lead to a surge of interest in active learning, an adaptive, semi-supervised learning paradigm.
- For passive RPU-learning with comparison queries, the authors will inherit the lower bound from the PAC model (Proposition 1.3).
- The authors will show that this bound is tight up to a linear factor in dimension, and further that employing comparison queries in general shifts the RPU model from being intractable to losing only a logarithmic factor over PAC-learning in both the passive and active regimes.

Highlights

- In recent years, the availability of big data and the high cost of labeling has lead to a surge of interest in active learning, an adaptive, semi-supervised learning paradigm
- Our goal would be to minimize the number of samples the learner draws before approximately learning the concept class with high probability (PAC-learning)
- Our work adopts a mixture of the approaches of Balcan et al and Kane et al We show that by leveraging comparison queries, non-homogeneous linear separators may be learned in exponentially fewer samples as long as the distribution satisfies weak concentration and anti-concentration bounds, conditions realized by, for instance, s-concave distributions
- Similar to the passive case, for active learning we study the query complexity q(ε, δ), the minimum number of queries to learn some pair (X, C) in either the PAC-learningProbably Approximately Correct (PAC) or Reliable and Probably Useful (RPU) learning models
- We directly extend the original algorithm of Balcan and Long to non-homogeneous linear separators via the inclusion of comparison queries, and leverage the concentration results of Balcan and Zhang to provide an inference based algorithm for learning under s-concave distributions
- Our algorithm labels finite samples drawn from the uniform distribution over the unit ball in d-dimensions

Results

- Random Polytope Complexity =⇒ Lower Bound: Imagine the adversary chooses a distribution such that with high probability, every point that the learner queries is of the same sign.
- Let D be a log-concave distribution over Rd. The query complexity of Comparison-Pool-PAC learning (D, Rd, Hd) is q(ε, δ) = O d + log + log log log
- The authors will confirm that RPU-learning linear separators with only label queries is intractable in high dimensions, but can be made efficient in both the passive and active regimes via comparison queries.
- Label-only case, RPU-learning is lower bounded by the expected number of vertices on a random polytope drawn from the distribution D.
- For simple distributions such as uniform over the unit ball, this gives sample complexity which is exponential in dimension, making RPU-learning impractical for any sort of high-dimensional data.
- The authors' positive results for comparison based RPU-learning rely on weakening the concept of inference dimension to be distribution dependent.
- This lemma shows that RPU-learning (D, X, H) with inverse super-exponential average inference dimension loses only log factors over passive or active PAC-learning.
- Plugging this into the query complexity sets the latter term from Corollary 3.7 to 1, giving: The authors will show that by employing comparison queries the authors can improve the average inference dimension of linear separators from 2Ω(−n log(n)) to 2−Ω(n2), but first the authors will need to review a result on inference dimension from [5].
- Plugging this result into Corollary 3.7 gives the desired guarantee on Comparison-Pool-RPU learning query complexity.

Conclusion

- Hn ∼ Dn. for n ≥ Ω(d log2(d)) there exists an LDT using only label and comparison queries solving the point location problem with expected depth
- To match the methodology in lower bounding Label-Pool-RPU learning, the authors will draw the classifier uniformly from hyperplanes tangent to the unit ball.
- If there exist simple relative t-local queries with average inference dimension 2−O(nt) over some distribution D, it would imply a passive RPU-learning algorithm over D with sample complexity n(ε, δ) = O

Summary

- The availability of big data and the high cost of labeling has lead to a surge of interest in active learning, an adaptive, semi-supervised learning paradigm.
- For passive RPU-learning with comparison queries, the authors will inherit the lower bound from the PAC model (Proposition 1.3).
- The authors will show that this bound is tight up to a linear factor in dimension, and further that employing comparison queries in general shifts the RPU model from being intractable to losing only a logarithmic factor over PAC-learning in both the passive and active regimes.
- Random Polytope Complexity =⇒ Lower Bound: Imagine the adversary chooses a distribution such that with high probability, every point that the learner queries is of the same sign.
- Let D be a log-concave distribution over Rd. The query complexity of Comparison-Pool-PAC learning (D, Rd, Hd) is q(ε, δ) = O d + log + log log log
- The authors will confirm that RPU-learning linear separators with only label queries is intractable in high dimensions, but can be made efficient in both the passive and active regimes via comparison queries.
- Label-only case, RPU-learning is lower bounded by the expected number of vertices on a random polytope drawn from the distribution D.
- For simple distributions such as uniform over the unit ball, this gives sample complexity which is exponential in dimension, making RPU-learning impractical for any sort of high-dimensional data.
- The authors' positive results for comparison based RPU-learning rely on weakening the concept of inference dimension to be distribution dependent.
- This lemma shows that RPU-learning (D, X, H) with inverse super-exponential average inference dimension loses only log factors over passive or active PAC-learning.
- Plugging this into the query complexity sets the latter term from Corollary 3.7 to 1, giving: The authors will show that by employing comparison queries the authors can improve the average inference dimension of linear separators from 2Ω(−n log(n)) to 2−Ω(n2), but first the authors will need to review a result on inference dimension from [5].
- Plugging this result into Corollary 3.7 gives the desired guarantee on Comparison-Pool-RPU learning query complexity.
- Hn ∼ Dn. for n ≥ Ω(d log2(d)) there exists an LDT using only label and comparison queries solving the point location problem with expected depth
- To match the methodology in lower bounding Label-Pool-RPU learning, the authors will draw the classifier uniformly from hyperplanes tangent to the unit ball.
- If there exist simple relative t-local queries with average inference dimension 2−O(nt) over some distribution D, it would imply a passive RPU-learning algorithm over D with sample complexity n(ε, δ) = O

- Table1: Expected sample and query complexity for PAC learning (Bd, Rd, Hd)
- Table2: Expected sample and query complexity for RPU learning (Bd, Rd, Hd)

Reference

- Sanjoy Dasgupta. Coarse sample complexity bounds for active learning. In Advances in neural information processing systems, pages 235–242, 2006.
- Long P. Balcan, M. Active and passive learning of linear separators under log-concave distributions. In Proceedings of the 26th Conference on Learning Theory, 2013.
- Maria-Florina F Balcan and Hongyang Zhang. Sample and computationally efficient learning algorithms under s-concave distributions. In Advances in Neural Information Processing Systems, pages 4796–4805, 2017.
- Ran El-Yaniv and Yair Wiener. Active learning via perfect selective classification. Journal of Machine Learning Research, 13(Feb):255–279, 2012.
- Lovett S. Moran S. Zhang J Kane, D. Active classification with comparison queries. In IEEE 58th Annual Symposium on Foundations of Computer Science, 2017.
- Benjamin Satzger, Markus Endres, and Werner Kiessling. A preference-based recommender system. In Proceedings of the 7th International Conference on E-Commerce and Web Technologies, EC-Web’06, pages 31–40, Berlin, Heidelberg, 200Springer-Verlag. ISBN 3-540-37743-3, 978-3-540-37743-6.
- Yichong Xu, Hongyang Zhang, Kyle Miller, Aarti Singh, and Artur Dubrawski. Noise-tolerant interactive learning using pairwise comparisons. In Advances in Neural Information Processing Systems, pages 2431–2440, 2017.
- Sloan R. Rivest, R. Learning complicated concepts reliably and usefully. In Proceedings of the First Workshop on Computational Learning Theory, pages 61–71, 1998.
- Chi-Keung Chow. An optimum character recognition system using decision functions. IRE Transactions on Electronic Computers, (4):247–254, 1957.
- J. Kivinen. Learning reliably and with one-sided error. Mathematical Systems Theory, 28(2):141–172, 1995.
- J. Kivinen. Reliable and useful learning with uniform probability distributions. In Proceedings of the First International Workshop on Algorithmic Learning Theory, 1990.
- L. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
- Chervonenkis A. Vapnik V. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971.
- Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. Learnability and the vapnik-chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989.
- Nigam K. McCallum, A. Employing em and pool-based active learning for text classification. In Proceedings of the Fifteenth International Conference on Machine Learning, 1998.
- E. Baum. Neural net algorithms that learn in polynomial time from examples and queries. IEEE Tranactions on Neural Networks, 2(1):5–19, 1991.
- P. Long. On the sample complexity of pac learning halfspaces against the uniform distribution. IEEE Transactions on Neural Networks, 6(6):1556–1559, 1995.
- P. Long. An upper bound on the sample complexity of pac learning halfspaces with respect to the uniform distribution. Information Processing Letters, 2003.
- S. Hanneke. Theory of disagreement-based active learning. Foundations and Trends in Machine Learning, 7(2-3):131–309, 2014.
- Lovett S. Moran S. Kane, D. Near-optimal linear decision trees for k-sum and related problems. In IEEE 50th Annual Symposium on Theory of Computation, 2018.
- Sharir M. Ezra, E. A nearly quadratic bound for point-location in hyperplane arrangements, in the linear decision tree model. Discrete and Computational Geometry, pages 1–21, 2018.
- Lovett S. Moran S. Kane, D. Generalized comparison trees for point-location problems. In Proceedings of the 45th International Colloquium on Automata, Languages and Programming, 2018.
- Sanjeev R Kulkarni, Sanjoy K Mitter, and John N Tsitsiklis. Active learning using arbitrary binary valued queries. Machine Learning, 11(1):23–35, 1993.
- Vempala S. Lovasz, L. The geometry of logconcave functions and sampling algorithms. Random Structures and Algorithms, 2007.
- Adam R Klivans, Philip M Long, and Alex K Tang. Baums algorithm learns intersections of halfspaces with respect to log-concave distributions. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 588–600.
- I. Barany. Random points and lattice points in convex bodies. The Bulletin, 45(3):339–365, 1994.
- Furedi Z. Barany, I. Approximation of the sphere by polytopes having few vertices. Proceedings of the American Mathematical Society, 102(3):651–659, 1988.
- JA Wieacker. Einige probleme der polyedrischen approximation. Freiburg im Breisgau: Diplomarbeit, 1978.
- Imre Barany et al. Sylvesters question: The probability that n points are in convex position. The annals of probability, 27(4):2020–2034, 1999.
- Eric P Xing, Michael I Jordan, Stuart J Russell, and Andrew Y Ng. Distance metric learning with application to clustering with side-information. In Advances in neural information processing systems, pages 521–528, 2003.
- Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative comparisons. In Advances in neural information processing systems, pages 41–48, 2004.
- Sameer Agarwal, Josh Wills, Lawrence Cayton, Gert Lanckriet, David Kriegman, and Serge Belongie. Generalized non-metric multidimensional scaling. In Artificial Intelligence and Statistics, pages 11–18, 2007.
- Brian McFee and Gert Lanckriet. Learning similarity in heterogeneous data. In Proceedings of the international conference on Multimedia information retrieval, pages 243–244. ACM, 2010.
- Kaizhu Huang, Yiming Ying, and Colin Campbell. Generalized sparse metric learning with relative comparisons. Knowledge and Information Systems, 28(1):25–45, 2011.
- Omer Tamuz, Ce Liu, Serge Belongie, Ohad Shamir, and Adam Tauman Kalai. Adaptively learning the crowd kernel. arXiv preprint arXiv:1105.1033, 2011.
- Buyue Qian, Xiang Wang, Fei Wang, Hongfei Li, Jieping Ye, and Ian Davidson. Active learning from relative queries. In Twenty-Third International Joint Conference on Artificial Intelligence, 2013.
- Pascal Massart, Elodie Nedelec, et al. Risk bounds for statistical learning. The Annals of Statistics, 34 (5):2326–2366, 2006.
- Alexander B Tsybakov et al. Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, 32(1):135–166, 2004.
- Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, and Ruth Urner. Efficient learning of linear separators under bounded noise. In Conference on Learning Theory, pages 167–190, 2015.
- Yining Wang and Aarti Singh. Noise-adaptive margin-based active learning and lower bounds under tsybakov noise condition. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
- Aaditya Ramdas and Aarti Singh. Optimal rates for stochastic convex optimization under tsybakov noise condition. In International Conference on Machine Learning, pages 365–373, 2013.
- DaoHong Xiang. Classification with gaussians and convex loss ii: improving error bounds by noise conditions. Science China Mathematics, 54(1):165–171, 2011.
- Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, and Hongyang Zhang. Learning and 1-bit compressed sensing under asymmetric noise. In Conference on Learning Theory, pages 152–192, 2016.
- Igor Carboni Oliveira and Rahul Santhanam. Conspiracies between learning algorithms, circuit lower bounds and pseudorandomness. 2017.

Tags

Comments