4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs
arxiv(2024)
摘要
Although RDBs store vast amounts of rich, informative data spread across
interconnected tables, the progress of predictive machine learning models as
applied to such tasks arguably falls well behind advances in other domains such
as computer vision or natural language processing. This deficit stems, at least
in part, from the lack of established/public RDB benchmarks as needed for
training and evaluation purposes. As a result, related model development thus
far often defaults to tabular approaches trained on ubiquitous single-table
benchmarks, or on the relational side, graph-based alternatives such as GNNs
applied to a completely different set of graph datasets devoid of tabular
characteristics. To more precisely target RDBs lying at the nexus of these two
complementary regimes, we explore a broad class of baseline models predicated
on: (i) converting multi-table datasets into graphs using various strategies
equipped with efficient subsampling, while preserving tabular characteristics;
and (ii) trainable models with well-matched inductive biases that output
predictions based on these input subgraphs. Then, to address the dearth of
suitable public benchmarks and reduce siloed comparisons, we assemble a diverse
collection of (i) large-scale RDB datasets and (ii) coincident predictive
tasks. From a delivery standpoint, we operationalize the above four dimensions
(4D) of exploration within a unified, scalable open-source toolbox called
4DBInfer. We conclude by presenting evaluations using 4DBInfer, the results of
which highlight the importance of considering each such dimension in the design
of RDB predictive models, as well as the limitations of more naive approaches
such as simply joining adjacent tables. Our source code is released at
https://github.com/awslabs/multi-table-benchmark .
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要