Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra

Proceedings of the 2019 International Conference on Management of Data(2019)

引用 39|浏览35
暂无评分
摘要
Accelerating machine learning (ML) over relational data is a key focus of the database community. While many real-world datasets are multi-table, most ML tools expect single-table inputs, forcing users to materialize joins before ML, leading to data redundancy and runtime waste. Recent works on ''factorized ML'' address such issues by pushing ML through joins. However, they have hitherto been restricted to ML models linear in the feature space, rendering them less effective when users construct non-linear feature interactions such as pairwise products to boost ML accuracy. In this work, we take a first step towards closing this gap by introducing a new abstraction to enable pairwise feature interactions in multi-table data and present an extensive framework of algebraic rewrite rules for factorized LA operators over feature interactions. Our rewrite rules carefully exploit the interplay of the redundancy caused by both joins and interactions. We prototype our framework in Python to build a tool we call MorpheusFI. An extensive empirical evaluation with both synthetic and real datasets shows that MorpheusFI yields up to 5x speedups over materialized execution for a popular second-order gradient method and even an order of magnitude speedups over a popular stochastic gradient method.
更多
查看译文
关键词
data management for ml, factorized ml, feature interactions, linear algebra
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要