Accelerating a Distributed CPD Algorithm for Large Dense, Skewed Tensors

2018 IEEE International Conference on Big Data (Big Data)(2018)

引用 2|浏览49
暂无评分
摘要
Canonical Polyadic Decomposition (CPD) is a powerful technique for uncovering multilinear relationships in tensors. Current research in scalable CPD has focused on designing efficient decomposition algorithms for large sparse tensors that arise in machine learning and data mining applications. This work addresses the complementary need for efficient decomposition algorithms for large dense tensors that arise in signal processing applications. Such tensors are often highly skewed, with one mode (e.g., time) orders of magnitude larger than the others. We present an algorithm appropriate for MapReduce settings that uses both regularization and sketching to efficiently operate on such tensors. We have open-sourced an Apache Spark implementation of the algorithm and evaluate it on synthetic and real datasets to characterize the trade-offs in runtime and accuracy when using different types and combinations of regularization and sketching. We observe that a combination of random entry sketching plus Tikhonov regularization works best independent of the type or level of noise in the tensor. Similarly, we find that random entry sketching plus proximal regularization works best for ill-conditioned tensors. Further experiments demonstrate that the runtime scales sublinearly with the tensor size and highly sublinearly with the tensor rank. The use of regularization and sketching results in runtimes that are factors of 42-112× faster than those of the previous state-of-the-art MapReduce CPD implementation for large dense, skewed tensors, while having a negligible impact on the accuracy of the decompositions.
更多
查看译文
关键词
Canonical Polyadic Decomposition,PARAFAC,CP,tensor,tensor decomposition,distributed computing,MapReduce,Apache Spark,regularization,sketching
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要