Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process
CoRR(2024)
摘要
With the advent of artificial intelligence (AI) and machine learning (ML),
various domains of science and engineering communites has leveraged data-driven
surrogates to model complex systems from numerous sources of information
(data). The proliferation has led to significant reduction in cost and time
involved in development of superior systems designed to perform specific
functionalities. A high proposition of such surrogates are built extensively
fusing multiple sources of data, may it be published papers, patents, open
repositories, or other resources. However, not much attention has been paid to
the differences in quality and comprehensiveness of the known and unknown
underlying physical parameters of the information sources that could have
downstream implications during system optimization. Towards resolving this
issue, a multi-source data fusion framework based on Latent Variable Gaussian
Process (LVGP) is proposed. The individual data sources are tagged as a
characteristic categorical variable that are mapped into a physically
interpretable latent space, allowing the development of source-aware data
fusion modeling. Additionally, a dissimilarity metric based on the latent
variables of LVGP is introduced to study and understand the differences in the
sources of data. The proposed approach is demonstrated on and analyzed through
two mathematical (representative parabola problem, 2D Ackley function) and two
materials science (design of FeCrAl and SmCoFe alloys) case studies. From the
case studies, it is observed that compared to using single-source and source
unaware ML models, the proposed multi-source data fusion framework can provide
better predictions for sparse-data problems, interpretability regarding the
sources, and enhanced modeling capabilities by taking advantage of the
correlations and relationships among different sources.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要