Tupleware: "Big" Data, Big Analytics, Small Clusters.

CIDR(2015)

引用 107|浏览208
暂无评分
摘要
There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the challenges of the Googles and Facebooks of the world— processing petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users analyze relatively small datasets of up to several terabytes in size, perform primarily compute-intensive operations, and operate clusters ranging from only a few to a few dozen nodes. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes our vision for the design of TUPLEWARE, a new system specifically aimed at complex analytics on small clusters. TUPLEWARE’s architecture brings together ideas from the database and compiler communities to create a powerful end-to-end solution for data analysis that compiles workflows of user-defined functions into distributed programs. Our preliminary results show performance improvements of up to three orders of magnitude over alternative systems.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要