Dflow, a Python framework for constructing cloud-native AI-for-Science workflows
arxiv(2024)
摘要
In the AI-for-science era, scientific computing scenarios such as concurrent
learning and high-throughput computing demand a new generation of
infrastructure that supports scalable computing resources and automated
workflow management on both cloud and high-performance supercomputers. Here we
introduce Dflow, an open-source Python toolkit designed for scientists to
construct workflows with simple programming interfaces. It enables complex
process control and task scheduling across a distributed, heterogeneous
infrastructure, leveraging containers and Kubernetes for flexibility. Dflow is
highly observable and can scale to thousands of concurrent nodes per workflow,
enhancing the efficiency of complex scientific computing tasks. The basic unit
in Dflow, known as an Operation (OP), is reusable and independent of the
underlying infrastructure or context. Dozens of workflow projects have been
developed based on Dflow, spanning a wide range of projects. We anticipate that
the reusability of Dflow and its components will encourage more scientists to
publish their workflows and OP components. These components, in turn, can be
adapted and reused in various contexts, fostering greater collaboration and
innovation in the scientific community.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要