Learning Sensitive Combinations Of A/B Test Metrics

WSDM(2017)

引用 29|浏览37
暂无评分
摘要
Online search evaluation, and A/B testing in particular, is an irreplaceable tool for modern search engines. Typically, online experiments last for several days or weeks and require a considerable portion of the search traffic. This restricts their usefulness and applicability.To alleviate the need for large sample sizes in A/B experiments, several approaches were proposed. Primarily, these approaches are based on increasing the sensitivity (informally, the ability to detect changes with less observations) of the evaluation metrics. Such sensitivity improvements are achieved by applying variance reduction methods, e.g. stratification and control covariates. However, the ability to learn sensitive metric combinations that (a) agree with the ground-truth metric, and (b) are more sensitive, was not explored in the A/B testing scenario.In this work, we aim to close this gap. We formulate the problem of finding a sensitive metric combination as a data driven machine learning problem and propose two intuitive optimization approaches to address it. Next, we perform an extensive experimental study of our proposed approaches. In our experiments, we use a dataset of 118 A/B tests performed by Yandex and study eight state-of-the-art ground truth user engagement metrics, including Sessions per User and Absence Time. Our results suggest that a considerable sensitivity improvements over the ground-truth metrics can be achieved by using our proposed approaches.
更多
查看译文
关键词
A/B tests,online evaluation,sensitivtity improvement,metric combination,online controlled experiments
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要