Sampling Based Histogram PCA and Its Mapreduce Parallel Implementation on Multicore.

SYMMETRY-BASEL(2018)

引用 2|浏览20
暂无评分
摘要
. In existing principle component analysis (PCA) methods for histogram-valued symbolic data, projection results are approximated based on Moore's algebra and fail to reflect the data's true structure, mainly because there is no precise, unified calculation method for the linear combination of histogram data. In this paper, we propose a new PCA method for histogram data that distinguishes itself from various well-established methods in that it can project observations onto the space spanned by principal components more accurately and rapidly by sampling through a MapReduce framework. The new histogram PCA method is implemented under the same assumption of orthogonal dimensions for every observation with the existing literatures. To project observations, the method first samples from the original histogram variables to acquire single-valued data, on which linear combination operations can be performed. Then, the projection of observations can be given by linear combination of loading vectors and single-valued samples, which is close to accurate projection results. Finally, the projection is summarized to histogram data. These procedures involve complex algorithms and large-scale data, which makes the new method time-consuming. To speed it up, we undertake a parallel implementation of the new method in a multicore MapReduce framework. A simulation study and an empirical study confirm that the new method is effective and time-saving.
更多
查看译文
关键词
histogram-valued symbolic data,Principal component analysis,sampling,mapreduce,parallel
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要