Distributed Nonlinear Semiparametric Support Vector Machine For Big Data Applications On Spark Frameworks

Roberto Diaz-Morales,Angel Navia-Vazquez

IEEE Transactions on Systems, Man, and Cybernetics（2020）

引用 5|浏览3

暂无评分

摘要

In recent years there has been a noticeable increase in the number of available Big Data infrastructures. This fact has promoted the adaptation of traditional machine learning techniques to be capable of addressing large scale problems in distributed environments. Kernel methods like support vector machines (SVMs) suffer from scalability problems due to their nonparametric nature and the complexity of their training procedures. In this paper, we propose a new and efficient distributed implementation of a training procedure for nonlinear semiparametric (budgeted) SVMs called distributed iterative reweighted least squares (IRWLS). This algorithm uses k-means to select the centroids of the semiparametric model and a new distributed algorithmic implementation of the IRWLS optimization procedure to find the weights of the model. We have implemented the proposed algorithm in Apache Spark and we have benchmarked it against other state-of-the-art methods, either full SVM (p-pack SVM) or budgeted (budgeted stochastic gradient descent). Experimental results show that the proposed algorithm achieves higher accuracy while controlling the size of the final model, and also offers high performance in terms of run time and efficiency, when processing very large datasets (the computation time grows linear with the number of training patterns).

查看译文

关键词

Support vector machines,Training,Kernel,Big Data,Sparks,Optimization,Computational modeling,Big data,budgeted,distributed,spark,support vector machine (SVM)

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要