Steering a Fleet: Adaptation for Large-Scale, Workflow-Based Experiments
arxiv(2024)
摘要
Experimental science is increasingly driven by instruments that produce vast
volumes of data and thus a need to manage, compute, describe, and index this
data. High performance and distributed computing provide the means of
addressing the computing needs; however, in practice, the variety of actions
required and the distributed set of resources involved, requires sophisticated
"flows" defining the steps to be performed on data. As each scan or measurement
is performed by an instrument, a new instance of the flow is initiated
resulting in a "fleet" of concurrently running flows, with the overall goal to
process all the data collected during a potentially long-running experiment.
During the course of the experiment, each flow may need to adapt its execution
due to changes in the environment, such as computational or storage resource
availability, or based on the progress of the fleet as a whole such as
completion or discovery of an intermediate result leading to a change in
subsequent flow's behavior. We introduce a cloud-based decision engine, Braid,
which flows consult during execution to query their run-time environment and
coordinate with other flows within their fleet. Braid accepts streams of
measurements taken from the run-time environment or from within flow runs which
can then be statistically aggregated and compared to other streams to determine
a strategy to guide flow execution. For example, queue lengths in execution
environments can be used to direct a flow to run computations in one
environment or another, or experiment progress as measured by individual flows
can be aggregated to determine the progress and subsequent direction of the
flows within a fleet. We describe Braid, its interface, implementation and
performance characteristics. We further show through examples and experience
modifying an existing scientific flow how Braid is used to make adaptable
flows.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要