## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Training linear SVMs in linear time

KDD, pp.217-226, (2006)

EI

Abstract

Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for high-dimensional sparse data commonly encountered in applications like text classification, word-sense disambiguation, and drug design. These applications involve a large number of examples n as well as a large number of features N,...More

Code:

Data:

Introduction

- Many applications of machine learning deal with problems where both the number of features N as well as the number of examples n is large
- Examples of such problems can be found in text classification, word-sense disambiguation, and drug design.
- While problems of such size seem daunting at first glance, the examples mentioned above have extremely sparse feature vectors, which gives hope that these problems can be handled efficiently.
- While there are training methods that scale linear in n (e.g. [18, 7, 6, 15]), such methods empirically scale quadratically with the number of features N

Highlights

- Many applications of machine learning deal with problems where both the number of features N as well as the number of examples n is large
- We propose the first general training algorithm for linear Support Vector Machines that provably scales O(s n) for classification and O(s n log(n)) for ordinal regression, where s is the average number of non-zero features
- We presented a simple Cutting-Plane Algorithm for training linear Support Vector Machines that is shown to converge in time O for classification and O(sn log(n)) for ordinal regression
- It is based on an alternative formulation of the Support Vector Machines optimization problem that exhibits a different form of sparsity compared to the conventional formulation
- The algorithm can in principle be applied to Support Vector Machines with Kernels

Methods

- While Theorems 4 and 6 characterize the asymptotic scaling of Algorithms 1 and 2, the behavior for small sample sizes may be different.
- The authors will empirically analyze the scaling behavior in the following experiments, as well as its sensitivity to C and.
- The authors implemented Algorithms 1 and 2 using SVM-Light as the basic quadratic programming software that is called in Line 4 of each algorithm.
- Other quadratic programming tools would work just as well, since |W| remained small in all the experiments.
- The authors will refer to the implementation of Algorithms 1 and 2 as SVM-Perf in the following.

Results

- As the value of C, the authors use the setting that achieves the best performance on the test set when using the full training set (C = 10, 000 for Reuters CCAT, C = 50, 000 for Reuters C11, C = 20, 000 for Arxiv astro-ph, C = 1, 000, 000 for Covertype 1, and C = 20, 000 for KDD04 Physics)

Conclusion

- The authors presented a simple Cutting-Plane Algorithm for training linear SVMs that is shown to converge in time O for classification and O(sn log(n)) for ordinal regression.
- It is based on an alternative formulation of the SVM optimization problem that exhibits a different form of sparsity compared to the conventional formulation.
- While a straightforward implementation is slower by a factor of n, matrix approximation techniques and the use of sampling might overcome this problem

- Table1: Training time in CPU-seconds

Funding

- This research was supported under NSF Award IIS-0412894 and through a gift from Google. KDD04 Physics O(x^0.5)

Study subjects and analysis

datasets: 5

SVM-Perf is available at http://svmlight.joachims.org. We use 5 datasets in our experiments, selected to cover a wide range of properties. 1

Reference

- R. Caruana, T. Joachims, and L. Backstrom. Kddcup 2004: Results and analysis. ACM SIGKDD Newsletter, 6(2):95–108, 2004.
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.
- R. Collobert and S. Bengio. Svmtorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research (JMLR), 1:143–160, 2001.
- J. Dez, J. del Coz, and A. Bahamonde. A support vector method for ranking minimizing the number of swapped pairs. Technical report, Artificial Intelligence Centre, Universidad de Oviedo at Gijn, 2006.
- S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of ACM-CIKM98, November 1998.
- M. Ferris and T. Munson. Interior-point methods for massive support vector machines. SIAM Journal of Optimization, 13(3):783–804, 2003.
- G. Fung and O. Mangasarian. Proximal support vector classifiers. In ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (KDD), 2001.
- R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115–132. MIT Press, Cambridge, MA, 2000.
- D. Hush and C. Scovel. Polynomial-time decomposition algorithms for support vector machines. Machine Learning, 51:51–71, 2003.
- [15] S. Keerthi and D. DeCoste. A modified finite newton method for fast solution of large scale linear svms. Journal of Machine Learning Research (JMLR), 6:341–361, 2005.
- [17] D. Lewis, Y. Yang, T. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research (JMLR), 5:361–397, 2004.
- [18] O. Mangasarian and D. Musicant. Lagrangian support vector machines. Journal of Machine Learning Research (JMLR), 1:161–177, 2001.
- [21] B. Schoelkopf and A. J. Smola. Learning with Kernels. The MIT Press, Cambridge, MA, 2002.
- [22] B. Scholkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12:1207–1245, 2000.
- [23] I. Tsang, J. Kwok, and P.-M. Cheung. Core vector machines: Fast svm training on very large data sets. Journal of Machine Learning Research (JMLR), 6:363–392, 2005.
- [24] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR), 6:1453 – 1484, September 2005.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn