The HSF Conditions Database Reference Implementation
CoRR(2024)
摘要
Conditions data is the subset of non-event data that is necessary to process
event data. It poses a unique set of challenges, namely a heterogeneous
structure and high access rates by distributed computing. The HSF Conditions
Databases activity is a forum for cross-experiment discussions inviting as
broad a participation as possible. It grew out of the HSF Community White Paper
work to study conditions data access, where experts from ATLAS, Belle II, and
CMS converged on a common language and proposed a schema that represents best
practice. Following discussions with a broader community, including NP as well
as HEP experiments, a core set of use cases, functionality and behaviour was
defined with the aim to describe a core conditions database API. This paper
will describe the reference implementation of both the conditions database
service and the client which together encapsulate HSF best practice conditions
data handling. Django was chosen for the service implementation, which uses an
ORM instead of the direct use of SQL for all but one method. The simple
relational database schema to organise conditions data is implemented in
PostgreSQL. The task of storing conditions data payloads themselves is
outsourced to any POSIX- compliant filesystem, allowing for transparent
relocation and redundancy. Cru- cially this design provides a clear separation
between retrieving the metadata describing which conditions data are needed for
a data processing job, and retrieving the actual payloads from storage. The
service deployment using Helm on OKD will be described together with scaling
tests and operations experience from the sPHENIX experiment running more than
25k cores at BNL.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要