Speeding up k-Nearest Neighbors Classifier for Large-Scale Multi-Label Learning on GPUs

Neurocomputing(2019)

引用 31|浏览27
暂无评分
摘要
Multi-label classification is one of the most dynamically growing fields of machine learning, due to its numerous real-life applications in solving problems that can be described by multiple labels at the same time. While most of works in this field focus on proposing novel and accurate classification algorithms, the issue of the computational complexity on growing dataset sizes is somehow marginalized. Owning to the ever-increasing capabilities of data capturing, we are faced with the problem of large-scale data mining that forces learners to be not only highly accurate, but also fast and scalable on high-dimensional spaces of instances, features, and labels. In this paper, we propose a highly efficient parallel approach for computing the multi-label k-Nearest Neighbor classifier on GPUs. While this method is highly effective due to its accuracy and simplicity, its computational complexity makes it prohibitive for large-scale data. We propose a four-step implementation that takes an advantage of the GPU architecture, allowing for an efficient execution of the multi-label k-Nearest Neighbors classifier without any loss of accuracy. Experiments carried out on a number of real and artificial benchmarks show that we are able to achieve speedups up to 200 times when compared to a sequential CPU execution, while efficiently scaling up to varying number of instances and features.
更多
查看译文
关键词
Machine learning,Multi-label classification,GPU computing,Large-scale data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要