4Dbinfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs
Conference on Neural Information Processing Systems(2024)
Amazon Web Services
Abstract
Although RDBs store vast amounts of rich, informative data spread acrossinterconnected tables, the progress of predictive machine learning models asapplied to such tasks arguably falls well behind advances in other domains suchas computer vision or natural language processing. This deficit stems, at leastin part, from the lack of established/public RDB benchmarks as needed fortraining and evaluation purposes. As a result, related model development thusfar often defaults to tabular approaches trained on ubiquitous single-tablebenchmarks, or on the relational side, graph-based alternatives such as GNNsapplied to a completely different set of graph datasets devoid of tabularcharacteristics. To more precisely target RDBs lying at the nexus of these twocomplementary regimes, we explore a broad class of baseline models predicatedon: (i) converting multi-table datasets into graphs using various strategiesequipped with efficient subsampling, while preserving tabular characteristics;and (ii) trainable models with well-matched inductive biases that outputpredictions based on these input subgraphs. Then, to address the dearth ofsuitable public benchmarks and reduce siloed comparisons, we assemble a diversecollection of (i) large-scale RDB datasets and (ii) coincident predictivetasks. From a delivery standpoint, we operationalize the above four dimensions(4D) of exploration within a unified, scalable open-source toolbox called4DBInfer. We conclude by presenting evaluations using 4DBInfer, the results ofwhich highlight the importance of considering each such dimension in the designof RDB predictive models, as well as the limitations of more naive approachessuch as simply joining adjacent tables. Our source code is released athttps://github.com/awslabs/multi-table-benchmark .
MoreTranslated text
PDF
View via Publisher
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined