The ATLAS Production System Evolution: New Data Processing and Analysis Paradigm for the LHC Run2 and High-Luminosity

Journal of Physics: Conference Series(2017)

引用 12|浏览2
暂无评分
摘要
The second generation of the ATLAS Production System called ProdSys2 is a distributed workload manager that runs daily hundreds of thousands of jobs, from dozens of different ATLAS specific workflows, across more than hundred heterogeneous sites. It achieves high utilization by combining dynamic job definition based on many criteria, such as input and output size, memory requirements and CPU consumption, with manageable scheduling policies and by supporting different kind of computational resources, such as GRID, clouds, supercomputers and volunteer-computers. The system dynamically assigns a group of jobs (task) to a group of geographically distributed computing resources. Dynamic assignment and resources utilization is one of the major features of the system, it didn't exist in the earliest versions of the production system where Grid resources topology was predefined using national or/and geographical pattern. Production System has a sophisticated job fault-recovery mechanism, which efficiently allows to run multi-Terabyte tasks without human intervention. We have implemented "train" model and open-ended production which allow to submit tasks automatically as soon as new set of data is available and to chain physics groups data processing and analysis with central production by the experiment. We present an overview of the ATLAS Production System and its major components features and architecture: task definition, web user interface and monitoring. We describe the important design decisions and lessons learned from an operational experience during the first year of LHC Run2. We also report the performance of the designed system and how various workflows, such as data (re)processing, Monte-Carlo and physics group production, users analysis, are scheduled and executed within one production system on heterogeneous computing resources.
更多
查看译文
关键词
atlas production system evolution,lhc run2,new data processing,high-luminosity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要