Feisu: Fast Query Execution Over Heterogeneous Data Sources On Large-Scale Clusters

2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017)(2017)

引用 3|浏览93
暂无评分
摘要
Fast data analytics at an increasingly large scale has become a critical task in any Internet service company. For example, in Baidu, the major search engine company in China, large volumes of Web and business data in PB-scale are timely and constantly acquired and analyzed for the purposes of evaluating product revenue, tracking product demanding activities on market, predicting user behavior, upgrading product rankings, and diagnosing spam cases, and many others. Response time for queries of various data analytics not only affects user experiences, but also has a serious impact on productivity of business operations.In this paper, to meet the challenge of fast data analytics, we present Feisu (meaning fast in Chinese), a data integration system over heterogeneous storage systems, which has been widely used in Baidu's critical and daily business analytics applications after our R&D efforts. Feisu is designed and implemented to co-work together with several heterogeneous storage systems, and exploit the query similarity embedded in complex query workloads. Our experiments using real world workloads show that Feisu can significantly improve query performance in Baidu. Feisu has been in production use in Baidu for two years to effectively manage over dozens of petabytes of data for various applications.
更多
查看译文
关键词
Feisu,fast query execution,heterogeneous data sources,large-scale clusters,data analytics,Internet service company,Baidu,search engine company,China,business data,PB-scale,product revenue,product demanding activities,user behavior,product rankings,spam cases,business operations,data integration system,heterogeneous storage systems,R&D efforts,query similarity,complex query workloads
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要