ExtDict: Extensible Dictionaries for Data- and Platform-Aware Large-Scale Learning

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2017)

引用 0|浏览101
暂无评分
摘要
This paper proposes ExtDict, a novel data- and platform-aware framework for iterative analysis/learning of massive and dense datasets. Iterative execution is prohibitively costly for distributed architectures where the cost of moving data is continually growing compared with the cost of arithmetic computing. ExtDict creates a performance model that quantifies the computational cost of iterative analysis algorithms on a target platform in terms of FLOPs, communication, and memory, which characterize runtime, energy, and storage respectively. The core of ExtDict is a novel parametric data projection algorithm, called Extensible Dictionary, that enables versatile and sparse representations of the data to minimize this computational cost. We show that ExtDict can achieve the optimal performance objective, according to our quantified cost model, by platform-aware tuning of the Extensible Dictionary parameters. An accompanying API ensures automated applicability of ExtDict to various algorithms, datasets, and platforms. Proof-of-concept evaluations of massive and dense data on different platforms demonstrate more than an order of magnitude improvement in performance compared to the state-of-the-art, within guaranteed user-defined error bounds.
更多
查看译文
关键词
Big Data,Machine Learning,Parallel Processing,Subspace Sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要