Distributed Matrix Multiplication Performance Estimator for Machine Learning Jobs in Cloud Computing

2018 IEEE 11th International Conference on Cloud Computing (CLOUD)(2018)

引用 12|浏览47
暂无评分
摘要
Matrix multiplication is an important kernel task in many machine learning algorithms. As the size of input datasets increases, multiple workloads are analyzed in large-scale distributed cloud computing environments. Therefore, understanding the characteristics of a distributed matrix multiplication task is essential for running machine learning jobs in the cloud. Herein, we propose Matrix multiplication Performance Estimator for Cloud computing, a method to predict the latency of matrix multiplication of various sizes and shapes in a distributed cloud computing environment. We first characterize the overhead of a distributed matrix multiplication task and propose features to model the latency of a task with different input types. Using the proposed features, a latency prediction model is developed by applying a data mining algorithm and a parameter optimization step iteratively. In experiments with 236 distinct types of matrix multiplications on diverse cloud instances running Apache Spark, we confirm that the proposed method can model the latency of various types of matrix multiplication tasks effectively and capture the non-linear interactions among the proposed features. A comparison with the state-of-the-art cloud computing performance predictor, Ernest, reveals that the proposed method provides 63% lower Root Mean Square Error (RMSE) for a distributed matrix multiplication latency prediction task and confirms the uniqueness of the distributed matrix multiplication workload.
更多
查看译文
关键词
Spark,Distributed Matrix Multiplication,Machine Learning,Cloud Computing,Prediction Model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要