4Dbinfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs

Minjie Wang,Quan Gan,David Wipf,Zhenkun Cai,Ning Li,Jianheng Tang,Yanlin Zhang,Zizhao Zhang,Zunyao Mao,Yakun Song,Yanbo Wang,Jiahang Li, Han Zhang, Guang Yang,Xiao Qin,Chuan Lei,Muhan Zhang,Weinan Zhang,Christos Faloutsos,Zheng Zhang

Conference on Neural Information Processing Systems（2024）

Amazon Web Services

Cited 3|Views36

Abstract

Although RDBs store vast amounts of rich, informative data spread acrossinterconnected tables, the progress of predictive machine learning models asapplied to such tasks arguably falls well behind advances in other domains suchas computer vision or natural language processing. This deficit stems, at leastin part, from the lack of established/public RDB benchmarks as needed fortraining and evaluation purposes. As a result, related model development thusfar often defaults to tabular approaches trained on ubiquitous single-tablebenchmarks, or on the relational side, graph-based alternatives such as GNNsapplied to a completely different set of graph datasets devoid of tabularcharacteristics. To more precisely target RDBs lying at the nexus of these twocomplementary regimes, we explore a broad class of baseline models predicatedon: (i) converting multi-table datasets into graphs using various strategiesequipped with efficient subsampling, while preserving tabular characteristics;and (ii) trainable models with well-matched inductive biases that outputpredictions based on these input subgraphs. Then, to address the dearth ofsuitable public benchmarks and reduce siloed comparisons, we assemble a diversecollection of (i) large-scale RDB datasets and (ii) coincident predictivetasks. From a delivery standpoint, we operationalize the above four dimensions(4D) of exploration within a unified, scalable open-source toolbox called4DBInfer. We conclude by presenting evaluations using 4DBInfer, the results ofwhich highlight the importance of considering each such dimension in the designof RDB predictive models, as well as the limitations of more naive approachessuch as simply joining adjacent tables. Our source code is released athttps://github.com/awslabs/multi-table-benchmark .

Translated text

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

Summary is being generated by the instructions you defined