Characterizing and accelerating indexing techniques on distributed ordered tables

2017 IEEE International Conference on Big Data (Big Data)(2017)

引用 4|浏览4
暂无评分
摘要
In recent years, most Web 2.0/3.0 applications have been built on top of distributed systems which allow data to be modeled as Distributed Ordered Tables (DOTs) such as Apache HBase. To analyze the stored data, SQL-like range queries over a DOT are fundamental requirements. However, range queries over existing DOT implementations are highly inefficient. Several secondary index techniques have been proposed to alleviate this issue, but they introduce additional overhead while creating and updating the index. Moreover, index techniques introduce several additional challenges for DOTs, particularly, network communication and thread models for concurrent request processing. In this paper, we first characterize the performance of index techniques on DOTs from a networking perspective. We then propose an RDMA-based high-performance communication framework which uses HBase as the underlying DOT implementation to accelerate these techniques. We propose several thread models for our RDMA-based design and compare their performance. We design a parallel insert operation to reduce index creation overhead. We also design several benchmarks to evaluate DOT-based systems. Experimental evaluations with state-of-the-art index techniques (CCIndex and Apache Phoenix) show that our design can reduce the insert overhead for secondary indices to just 23%. Evaluation with TPC-H queries demonstrates an increase in query throughput by up to 2x, while application evaluation with real-world workloads and data (100M records) provided by AdMaster Inc. show up to 35% reduction in execution time.
更多
查看译文
关键词
DOT,RDMA,HBase,Indexing,CCIndex,Phoenix
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要