Robust and efficient memory management in Apache AsterixDB

SOFTWARE-PRACTICE & EXPERIENCE(2020)

引用 12|浏览101
暂无评分
摘要
Traditional relational database systems handle data by dividing their memory into sections such as a buffer cache and working memory, assigning a memory budget to each section to efficiently manage a limited amount of overall memory. They also assign memory budgets to memory-intensive operators such as sorts and joins and control the allocation of memory to these operators; each memory-intensive operator attempts to maximize its memory usage to reduce disk I/O cost. Implementing such memory-intensive operators requires a careful design and application of appropriate algorithms that properly utilize memory. Today's Big Data management systems need the ability to handle large amounts of data similarly, as it is unrealistic to assume that truly big data will fit into memory. In this article, we share our memory management experiences in Apache AsterixDB, an open-source Big Data management software platform that scales out horizontally on shared-nothing commodity computing clusters. We describe the implementation of AsterixDB's memory-intensive operators and their designs related to memory management. We also discuss memory management at the global (cluster) level. We conducted an experimental study using several synthetic and real datasets to explore the impact of this work. We believe that future Big Data management system builders can benefit from these experiences.
更多
查看译文
关键词
Apache AsterixDB,big data management system,group by,hash join,inverted-index search,memory management,sort
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要