Scalable network analytics for characterization of outbreak influence in voluminous epidemiology datasets: Scalable network analytics for characterization of outbreak influence in voluminous epidemiology datasets

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE(2019)

引用 4|浏览21
暂无评分
摘要
Planning for large-scale epidemiological outbreaks in livestock populations often involves executing compute-intensive disease spread simulations. To capture the probabilities of various outcomes, these simulations are executed several times over a collection of representative input scenarios, producing voluminous data. The resulting datasets contain valuable insights, including sequences of events that lead to extreme outbreaks. However, discovering and leveraging such information is also computationally expensive. In this study, we set out to achieve two goals, ie, (1) providing a distributed framework for modeling disease transmission at scale using Spark, including improvements to the default GraphX partitioning strategy, and (2) giving planners and epidemiologists a means to analyze interactions between entities (herds) during simulated disease outbreaks. Using our disease transmission network (DTN), planners or analysts can isolate herds that have a disproportionate effect on epidemiological outcomes, enabling effective allocation of limited resources such as vaccinations and field personnel. We use a representative dataset to verify our approach and optimized the underlying graph partitioning algorithm to ensure the system will scale with increases in the dataset size or number of participating machines. Our analysis includes identification of influential herds as well as the creation of machine learning models for accurate classifications that generalize to other datasets.
更多
查看译文
关键词
disease spread classification,distributed analytics,distributed graph partitioning,epidemiological network analysis,super-spreading events
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要