AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We presented a simple Cutting-Plane Algorithm for training linear Support Vector Machines that is shown to converge in time O for classification and O(sn log(n)) for ordinal regression

Training linear SVMs in linear time

KDD, pp.217-226, (2006)

Cited by: 1996|Views655
EI

Abstract

Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for high-dimensional sparse data commonly encountered in applications like text classification, word-sense disambiguation, and drug design. These applications involve a large number of examples n as well as a large number of features N,...More

Code:

Data:

Introduction
  • Many applications of machine learning deal with problems where both the number of features N as well as the number of examples n is large
  • Examples of such problems can be found in text classification, word-sense disambiguation, and drug design.
  • While problems of such size seem daunting at first glance, the examples mentioned above have extremely sparse feature vectors, which gives hope that these problems can be handled efficiently.
  • While there are training methods that scale linear in n (e.g. [18, 7, 6, 15]), such methods empirically scale quadratically with the number of features N
Highlights
  • Many applications of machine learning deal with problems where both the number of features N as well as the number of examples n is large
  • We propose the first general training algorithm for linear Support Vector Machines that provably scales O(s n) for classification and O(s n log(n)) for ordinal regression, where s is the average number of non-zero features
  • We presented a simple Cutting-Plane Algorithm for training linear Support Vector Machines that is shown to converge in time O for classification and O(sn log(n)) for ordinal regression
  • It is based on an alternative formulation of the Support Vector Machines optimization problem that exhibits a different form of sparsity compared to the conventional formulation
  • The algorithm can in principle be applied to Support Vector Machines with Kernels
Methods
  • While Theorems 4 and 6 characterize the asymptotic scaling of Algorithms 1 and 2, the behavior for small sample sizes may be different.
  • The authors will empirically analyze the scaling behavior in the following experiments, as well as its sensitivity to C and.
  • The authors implemented Algorithms 1 and 2 using SVM-Light as the basic quadratic programming software that is called in Line 4 of each algorithm.
  • Other quadratic programming tools would work just as well, since |W| remained small in all the experiments.
  • The authors will refer to the implementation of Algorithms 1 and 2 as SVM-Perf in the following.
Results
  • As the value of C, the authors use the setting that achieves the best performance on the test set when using the full training set (C = 10, 000 for Reuters CCAT, C = 50, 000 for Reuters C11, C = 20, 000 for Arxiv astro-ph, C = 1, 000, 000 for Covertype 1, and C = 20, 000 for KDD04 Physics)
Conclusion
  • The authors presented a simple Cutting-Plane Algorithm for training linear SVMs that is shown to converge in time O for classification and O(sn log(n)) for ordinal regression.
  • It is based on an alternative formulation of the SVM optimization problem that exhibits a different form of sparsity compared to the conventional formulation.
  • While a straightforward implementation is slower by a factor of n, matrix approximation techniques and the use of sampling might overcome this problem
Tables
  • Table1: Training time in CPU-seconds
Download tables as Excel
Funding
  • This research was supported under NSF Award IIS-0412894 and through a gift from Google. KDD04 Physics O(x^0.5)
Study subjects and analysis
datasets: 5
SVM-Perf is available at http://svmlight.joachims.org. We use 5 datasets in our experiments, selected to cover a wide range of properties. 1

Reference
  • R. Caruana, T. Joachims, and L. Backstrom. Kddcup 2004: Results and analysis. ACM SIGKDD Newsletter, 6(2):95–108, 2004.
    Google ScholarLocate open access versionFindings
  • C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.
    Findings
  • R. Collobert and S. Bengio. Svmtorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research (JMLR), 1:143–160, 2001.
    Google ScholarLocate open access versionFindings
  • J. Dez, J. del Coz, and A. Bahamonde. A support vector method for ranking minimizing the number of swapped pairs. Technical report, Artificial Intelligence Centre, Universidad de Oviedo at Gijn, 2006.
    Google ScholarFindings
  • S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of ACM-CIKM98, November 1998.
    Google ScholarLocate open access versionFindings
  • M. Ferris and T. Munson. Interior-point methods for massive support vector machines. SIAM Journal of Optimization, 13(3):783–804, 2003.
    Google ScholarLocate open access versionFindings
  • G. Fung and O. Mangasarian. Proximal support vector classifiers. In ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (KDD), 2001.
    Google ScholarLocate open access versionFindings
  • R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115–132. MIT Press, Cambridge, MA, 2000.
    Google ScholarLocate open access versionFindings
  • D. Hush and C. Scovel. Polynomial-time decomposition algorithms for support vector machines. Machine Learning, 51:51–71, 2003.
    Google ScholarLocate open access versionFindings
  • [15] S. Keerthi and D. DeCoste. A modified finite newton method for fast solution of large scale linear svms. Journal of Machine Learning Research (JMLR), 6:341–361, 2005.
    Google ScholarLocate open access versionFindings
  • [17] D. Lewis, Y. Yang, T. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research (JMLR), 5:361–397, 2004.
    Google ScholarLocate open access versionFindings
  • [18] O. Mangasarian and D. Musicant. Lagrangian support vector machines. Journal of Machine Learning Research (JMLR), 1:161–177, 2001.
    Google ScholarLocate open access versionFindings
  • [21] B. Schoelkopf and A. J. Smola. Learning with Kernels. The MIT Press, Cambridge, MA, 2002.
    Google ScholarFindings
  • [22] B. Scholkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12:1207–1245, 2000.
    Google ScholarLocate open access versionFindings
  • [23] I. Tsang, J. Kwok, and P.-M. Cheung. Core vector machines: Fast svm training on very large data sets. Journal of Machine Learning Research (JMLR), 6:363–392, 2005.
    Google ScholarLocate open access versionFindings
  • [24] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR), 6:1453 – 1484, September 2005.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科