Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control.

Proceedings of the ACM on Management of Data(2023)

引用 0|浏览4
暂无评分
摘要
Many big data systems are written in languages such as C, C++, Java, and Scala to process large amounts of data efficiently, while data analysts often use Python to conduct data wrangling, statistical analysis, and machine learning. User-defined functions (UDFs) are commonly used in these systems to bridge the gap between the two ecosystems. In this paper, we propose Udon, a novel debugger to support fine-grained debugging of UDFs. Udon encapsulates the modern line-by-line debugging primitives, such as the ability to set breakpoints, perform code inspections, and make code modifications while executing a UDF on a single tuple. It includes a novel debug-aware UDF execution model to ensure the responsiveness of the operator during debugging. It utilizes advanced state-transfer techniques to satisfy breakpoint conditions that span across multiple UDFs. It incorporates various optimization techniques to reduce the runtime overhead. We conduct experiments with multiple UDF workloads on various datasets and show its high efficiency and scalability.
更多
查看译文
关键词
big data systems,debugging,user-defined functions (UDFs)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要