The Cascading Analysts Algorithm.

SIGMOD/PODS '18: International Conference on Management of Data Houston TX USA June, 2018(2018)

引用 12|浏览62
暂无评分
摘要
We study changes in metrics that are defined on a cartesian product of trees. Such metrics occur naturally in many practical applications, where a global metric (such as revenue) can be broken down along several hierarchical dimensions (such as location, gender, etc). Given a change in such a metric, our goal is to identify a small set of non-overlapping data segments that account for a majority of the change. An organization interested in improving the metric can then focus their attention on these data segments. Our key contribution is an algorithm that naturally mimics the operation of a hierarchical organization of analysts. The algorithm has been successfully applied within Google's ad platform (AdWords) to help Google's advertisers triage the performance of their advertising campaigns, and within Google Analytics to help website developers understand their traffic. We empirically analyze the runtime and quality of the algorithm by comparing it against benchmarks for a census dataset. We prove theoretical, worst-case bounds on the performance of the algorithm. For instance, we show that the algorithm is optimal for two dimensions, and has an approximation ratio log d-2 (n+1) for d ≥ 3 dimensions, where n is the number of input data segments. For the advertising application, we can show that our algorithm is a 2-approximation. To characterize the hardness of the problem, we study data patterns called conflicts These allow us to construct hard instances of the problem, and derive a lower bound of 1.144 d-2 (again d ≥3) for our algorithm, and to show that the problem is NP-hard; this justifies are focus on approximation.
更多
查看译文
关键词
explanation,business metrics,decision support,business intelligence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要