PlanAlyzer: Assessing Threats to the Validity of Online Experiments

COMMUNICATIONS OF THE ACM(2021)

引用 1|浏览12
暂无评分
摘要
Online experiments are an integral part of the design and evaluation of software infrastructure at Internet firms. To handle the growing scale and complexity of these experiments, firms have developed software frameworks for their design and deployment. Ensuring that the results of experiments in these frameworks are trustworthy-referred to as internal validity-can be difficult. Currently, verifying internal validity requires manual inspection by someone with substantial expertise in experimental design. We present the first approach for checking the internal validity of online experiments statically, that is, from code alone. We identify well-known problems that arise in experimental design and causal inference, which can take on unusual forms when expressed as computer programs: failures of randomization and treatment assignment, and causal sufficiency errors. Our analyses target PlanOut, a popular framework that features a domain-specific language (DSL) to specify and run complex experiments. We have built PlanAlyzer, a tool that checks PlanOut programs for threats to internal validity, before automatically generating important data for the statistical analyses of a large class of experimental designs. We demonstrate PlanAlyzer's utility on a corpus of PlanOut scripts deployed in production at Facebook, and we evaluate its ability to identify threats on a mutated subset of this corpus. PlanAlyzer has both precision and recall of 92% on the mutated corpus, and 82% of the contrasts it generates match hand-specified data.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要