Abstraction Without Regret for Efficient Data Processing

semanticscholar(2014)

引用 2|浏览0
暂无评分
摘要
ion Without Regret for Efficient Data Processing Tiark Rompf ‡∗ Nada Amin∗ Thierry Coppey† Mohammad Dashti† Manohar Jonnalagedda∗ Yannis Klonatos† Martin Odersky∗ Christoph Koch† ‡Oracle Labs: {first.last}@oracle.com ∗Programming Methods Lab / † DATA Lab, EPFL: {first.last}@epfl.ch Growing data sets require efficiency on all levels of the processing stack. This leads to a trade-off between generality and specialization: On the one hand, we want reusable, generic solutions that can support many different kinds of data and many different processing tasks. But on the other hand, programs need to be specialized to data schemata and execution environments to obtain good performance. To give a real-world example, popular open-source and commercial database systems have been shown [10, 12] to perform 10 or 100x worse on certain queries than specialized, hand-written C implementations of the same query. At the same time such systems contain hundreds of thousands of lines of optimized C code, which suggests that manual optimization may not be cost effective. The database community has realized this problem, with prominent researchers arguing to replace generic database systems with specialized solutions [9]. In this talk, we make the case for a) more collaboration between DB and PL researchers, and b) for using cutting-edge PL technology such as generative metaprogramming (staging [11]) to turn interpreters, which are ubiquitous in data processing pipelines, into compilers. We present a range of examples from previous and ongoing work in the context of Scala and LMS (Lightweight Modular Staging) [8], including recent collaborative efforts on developing database systems using these techniques. As a takeaway for programming language designers, we further argue that for truly expressive multi-stage programming, quotation mechanisms should offer more semantics-preservation guarantees, in particular about maintaining statement execution order across stage boundaries. Motivating Example Let us consider a small programming example in Scala. We would like to implement a generic library function to read CSV files. A CSV file contains tabular data, where the first line defines the schema, i.e. the names of the columns. We would like to iterate over all the rows in a file and access the data fields by name: processCSV("data.txt") { record => // sample data: if (record("Flag") == "yes") // Name, Value, Flag println(record("Name")) // A, 7, no
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要