CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning
CoRR(2024)
摘要
Advances in graph machine learning (ML) have been driven by applications in
chemistry as graphs have remained the most expressive representations of
molecules. While early graph ML methods focused primarily on small organic
molecules, recently, the scope of graph ML has expanded to include inorganic
materials. Modelling the periodicity and symmetry of inorganic crystalline
materials poses unique challenges, which existing graph ML methods are unable
to address. Moving to inorganic nanomaterials increases complexity as the scale
of number of nodes within each graph can be broad (10 to 10^5). The bulk of
existing graph ML focuses on characterising molecules and materials by
predicting target properties with graphs as input. However, the most exciting
applications of graph ML will be in their generative capabilities, which is
currently not at par with other domains such as images or text.
We invite the graph ML community to address these open challenges by
presenting two new chemically-informed large-scale inorganic (CHILI)
nanomaterials datasets: A medium-scale dataset (with overall >6M nodes, >49M
edges) of mono-metallic oxide nanomaterials generated from 12 selected crystal
types (CHILI-3K) and a large-scale dataset (with overall >183M nodes, >1.2B
edges) of nanomaterials generated from experimentally determined crystal
structures (CHILI-100K). We define 11 property prediction tasks and 6 structure
prediction tasks, which are of special interest for nanomaterial research. We
benchmark the performance of a wide array of baseline methods and use these
benchmarking results to highlight areas which need future work. To the best of
our knowledge, CHILI-3K and CHILI-100K are the first open-source nanomaterial
datasets of this scale – both on the individual graph level and of the dataset
as a whole – and the only nanomaterials datasets with high structural and
elemental diversity.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要