WeChat Mini Program
Old Version Features

Steering a Fleet: Adaptation for Large-Scale, Workflow-Based Experiments

arXiv (Cornell University)(2024)

University of Chicago Chicago

Cited 0|Views25
Abstract
Experimental science is increasingly driven by instruments that produce vastvolumes of data and thus a need to manage, compute, describe, and index thisdata. High performance and distributed computing provide the means ofaddressing the computing needs; however, in practice, the variety of actionsrequired and the distributed set of resources involved, requires sophisticated"flows" defining the steps to be performed on data. As each scan or measurementis performed by an instrument, a new instance of the flow is initiatedresulting in a "fleet" of concurrently running flows, with the overall goal toprocess all the data collected during a potentially long-running experiment.During the course of the experiment, each flow may need to adapt its executiondue to changes in the environment, such as computational or storage resourceavailability, or based on the progress of the fleet as a whole such ascompletion or discovery of an intermediate result leading to a change insubsequent flow's behavior. We introduce a cloud-based decision engine, Braid,which flows consult during execution to query their run-time environment andcoordinate with other flows within their fleet. Braid accepts streams ofmeasurements taken from the run-time environment or from within flow runs whichcan then be statistically aggregated and compared to other streams to determinea strategy to guide flow execution. For example, queue lengths in executionenvironments can be used to direct a flow to run computations in oneenvironment or another, or experiment progress as measured by individual flowscan be aggregated to determine the progress and subsequent direction of theflows within a fleet. We describe Braid, its interface, implementation andperformance characteristics. We further show through examples and experiencemodifying an existing scientific flow how Braid is used to make adaptableflows.
More
Translated text
Key words
Scientific Workflows,Workflow Management,Computational Research
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:论文提出了一种云基础的决策引擎Braid,用于大规模工作流实验的适应性和协调,以处理实验过程中环境变化和资源调整。

方法】:作者通过设计Braid,允许工作流在执行过程中查询运行时环境,并与其他工作流协调,以适应计算和存储资源的变化。

实验】:研究通过修改现有的科学工作流,展示了Braid如何实现可适应的工作流,并进行了性能特性测试,但未提及具体使用的数据集名称。