AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We demonstrate the usefulness of the explanations generated by ExplainED on real-life, undocumented Exploratory Data Analysis notebooks

ExplainED: explanations for EDA notebooks

Hosted Content, no. 12 (2020): 2917-2920

Cited: 0|Views45
EI
Full Text
Bibtex
Weibo

Abstract

AbstractExploratory Data Analysis (EDA) is an essential yet highly demanding task. To get a head start before exploring a new dataset, data scientists often prefer to view existing EDA notebooks - illustrative exploratory sessions that were created by fellow data scientists who examined the same dataset and shared their notebooks via onli...More

Code:

Data:

0
Introduction
  • Exploratory Data Analysis (EDA) is an important step in any data scientific (DS) pipeline.
  • The authors demonstrate the usefulness of the explanations generated by ExplainED on real-life, undocumented EDA notebooks.
  • PVLDB Reference Format: Daniel Deutch, Amir Gilad, Tova Milo, and Amit Somech.
Highlights
  • Exploratory Data Analysis (EDA) is an important step in any data scientific (DS) pipeline
  • We demonstrate the usefulness of the explanations generated by ExplainED on real-life, undocumented EDA notebooks
  • An EDA notebook contain a curated summary of an EDA process, presented through a notebook interface – a literate programming environment that allows users to document a sequence of programmatic operations, their results, as well as to add free-text explanations
  • We will first present the audience with an undocumented EDA notebook, reveal the explanations generated by ExplainED for each exploratory step
  • As explained in the sequel, ExplainED analyzes the interestingness of each EDA operation qi before producing an explanation that describes what exactly is interesting in the resulting view Vi
  • Given a view Vi in an EDA notebook, we first assess its interstingness w.r.t. the measures defined above, derive which specific elements in the view have the highest impact on the interestingness score of the view, and present them in an illustrative, Natural Language (NL) template
Results
  • ExplainED uses Shapely to measure the contribution of each tuple to the interestingness score of the view.
  • ExplainED takes as input a view from a given EDA notebook, and generates a textual explanation as follows: First, the interestingness of the view is evaluated using several measures, each corresponding to a different interestingness facet.
  • Focusing on the measure that yielded the highest score, ExplainED computes the Shapley values of the top-k elements in the view w.r.t. the interestingness measure.
  • The authors will first present the audience with an undocumented EDA notebook, reveal the explanations generated by ExplainED for each exploratory step.
  • The authors define the data model for EDA notebooks and the considered interestingness measures.
  • Given a view Vi, ExplainED generates an explanatory text Ei, which highlights the elements that are interesting in Vi. For example, see the generated explanations in the red frames in Figure 1.
  • As explained in the sequel, ExplainED analyzes the interestingness of each EDA operation qi before producing an explanation that describes what exactly is interesting in the resulting view Vi. An interestingness measure the author is a function mapping each view to a real number.
  • Given a view Vi in an EDA notebook, the authors first assess its interstingness w.r.t. the measures defined above, derive which specific elements in the view have the highest impact on the interestingness score of the view, and present them in an illustrative, NL template.
  • The authors formalize the definition of a Shapley value of an element in a view w.r.t. an interestingness measure as follows.
  • Given an EDA view Vi generated from Vi−1 and an interestingness measure I, the Shapley value of the element e ∈ Vi is defined as: Shap(Vi, I, e) S
Conclusion
  • The authors will demonstrate the explanations that ExplainED generates for EDA notebooks and their usefulness over the Kaggle Flights dataset.
  • The authors will employ ExplainED to dynamically generate an explanation for each view in the notebook, demonstrating the value of the explanations to the data analysis process.
  • Technical Details: The authors will let participants look under the hood of ExplainED by showing the manner in which it selects the most relevant interestingness measures and finds the interesting tuples or groups in views based on their Shapley values.
Related work
  • Various methods of explaining query results have been proposed in the literature. Prominently, explanations using provenance [7, 2], interventions [13], influence [17], Shapley values [9], or using NL [4] among others. The main difference between these works and ours is that these works explain which input tuples affected the output of a query, while we try to find the input tuples that make the view interesting (i.e., that most affect the view’s interestingness score). There are also other tools for assisting users in composing EDA steps. For example, recommendations of EDA next-steps (e.g., [12]), and highlighting promising features to explore (e.g., [6]). However, such tools do not explain why are the generated views considered interesting.
Funding
  • This research has been funded by the Israeli Science Foundation (ISF), the Binational US-Israel Science Foundation, the Tel Aviv University Data Science center, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No 804302), and the Google Ph.D
Reference
  • [2] P. Buneman, S. Khanna, and W. Tan. Why and where: A characterization of data provenance. In ICDT, pages 316–330, 2001.
    Google ScholarLocate open access versionFindings
  • [3] V. Chandola and V. Kumar. Summarization - compressing data into an informative representation. KAIS, 12(3), 2007.
    Google ScholarLocate open access versionFindings
  • [4] D. Deutch, N. Frost, and A. Gilad. Provenance for natural language queries. PVLDB, 10(5):577–588, 2017.
    Google ScholarLocate open access versionFindings
  • [5] L. Geng and H. J. Hamilton. Interestingness measures for data mining: A survey. CSUR, 2006.
    Google ScholarLocate open access versionFindings
  • [6] A. Giuzio, G. Mecca, E. Quintarelli, M. Roveri, D. Santoro, and L. Tanca. Indiana: An interactive system for assisting database exploration. Information Systems, 83:40–56, 2019.
    Google ScholarLocate open access versionFindings
  • [7] T. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, pages 31–40, 2007.
    Google ScholarLocate open access versionFindings
  • [8] M. B. Kery, M. Radensky, M. Arya, B. E. John, and B. A. Myers. The story in the notebook: Exploratory data science using a literate programming tool. In CHI, 2018.
    Google ScholarLocate open access versionFindings
  • [9] E. Livshits, L. E. Bertossi, B. Kimelfeld, and M. Sebag. The shapley value of tuples in query answering. In ICDT, pages 20:1–20:19, 2020.
    Google ScholarLocate open access versionFindings
  • [10] S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In NIPS. 2017.
    Google ScholarFindings
  • [11] T. Milo, C. Ozeri, and A. Somech. Predicting ”what is interesting” by mining interactive-data-analysis session logs. In EDBT, 2019.
    Google ScholarLocate open access versionFindings
  • [12] T. Milo and A. Somech. Next-step suggestions for modern interactive data analysis platforms. In KDD, 2018.
    Google ScholarLocate open access versionFindings
  • [13] S. Roy and D. Suciu. A formal approach to finding explanations for database queries. In SIGMOD, pages 1579–1590, 2014.
    Google ScholarLocate open access versionFindings
  • [15] E. Strumbelj and I. Kononenko. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst., 41(3):647–665, 2014.
    Google ScholarLocate open access versionFindings
  • [17] E. Wu and S. Madden. Scorpion: Explaining away outliers in aggregate queries. PVLDB, 6(8):553–564, 2013.
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn