Structure-Aware Machine Learning over Multi-Relational Databases

International Conference on Management of Data(2021)

引用 2|浏览14
暂无评分
摘要
ABSTRACTWe consider the problem of computing machine learning models over multi-relational databases. The mainstream approach involves a costly repeated loop that data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and learn the desired model using this tool. In this thesis, we advocate for an alternative approach that avoids this loop and instead tightly integrates the query and learning tasks into one unified solution. By integrating these two tasks, we can exploit structure in the data and the query to optimize the end-to-end learning problem. We provide a framework for structure-aware learning for a variety of commonly used machine learning models that achieves runtime guarantees that can be asymptotically faster than the mainstream approach that first constructs the training dataset. In practice, this asymptotic gap translates into several orders of magnitude performance improvements over state-of-the-art machine learning packages such as TensorFlow, MADlib, scikit-learn, and mlpack. The thesis is composed of three parts. First, we present the methodology and theoretical foundation of structure-aware learning. Then, we report on the design and implementation of LMFAO, an in-memory engine for structure-aware learning over databases. Finally, we present an extensive experimental evaluation. In following, we briefly highlight each of these three parts.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要