TencentBoost: A Gradient Boosting Tree System with Parameter Server

2017 IEEE 33rd International Conference on Data Engineering (ICDE)(2017)

引用 25|浏览48
暂无评分
摘要
Gradient boosting tree (GBT), a widely used machine learning algorithm, achieves state-of-the-art performance in academia, industry, and data analytics competitions. Although existing scalable systems which implement GBT, such as XGBoost and MLlib, perform well for datasets with medium-dimensional features, they can suffer performance degradation for many industrial applications where the trained datasets contain highdimensional features. The performance degradation derives from their inefficient mechanisms for model aggregation-either mapreduce or all-reduce. To address this high-dimensional problem, we propose a scalable execution plan using the parameter server architecture to facilitate the model aggregation. Further, we introduce a sparse-pull method and an efficient index structure to increase the processing speed. We implement a GBT system, namely TencentBoost, in the production cluster of Tencent Inc. The empirical results show that our system is 2-20× faster than existing platforms.
更多
查看译文
关键词
gradient boosting tree system,TencentBoost,machine learning algorithm,data analytics,GBT,medium-dimensional features,performance degradation,high-dimensional features,model aggregation,parameter server architecture,sparse-pull method,index structure,GBT system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要