Managing Messes in Computational Notebooks

CHI, pp. 2702019.

Cited by: 18|Bibtex|Views84|Links
EI
Keywords:
clutter code history computational notebooks exploratory programming inconsistencyMore(2+)
Weibo:
Our qualitative usability study with 12 professional data scientists confrmed that cleaning computational notebooks is primarily about removing unwanted analysis code and results

Abstract:

Data analysts use computational notebooks to write code for analyzing and visualizing data. Notebooks help analysts iteratively write analysis code by letting them interleave code with output, and selectively execute cells. However, as analysis progresses, analysts leave behind old code and outputs, and overwrite important code, producing...More

Code:

Data:

0
Introduction
  • Data analysts often engage in “exploratory programming” as they write and refne code to understand unfamiliar data, test hypotheses, and build models [19].
  • A notebook’s user interface is a collection of code editors, called “cells.” At any time, the user can submit code from any cell to a hidden interpreter session
  • This design leads to three types of messes common to notebooks: disorder, where the interpreter runs code in a diferent order than it is presented in the cells; deletion, where the user deletes or overwrites the contents of a cell, but the interpreter retains the efect of the cell’s code; and dispersal, where the code that generates a result is spread across many distant cells.
  • While analysts often wish to share their fndings with others [16, 21, 33], they are often reluctant to do so until they have cleaned their code [31, 33]
Highlights
  • Data analysts often engage in “exploratory programming” as they write and refne code to understand unfamiliar data, test hypotheses, and build models [19]. They frequently use computational notebooks, which supplement the rapid iteration of an interpreted programming language with the ability to edit code in place, and see computational results interleaved with the code
  • A notebook’s user interface is a collection of code editors, called “cells.” At any time, the user can submit code from any cell to a hidden interpreter session. This design leads to three types of messes common to notebooks: disorder, where the interpreter runs code in a diferent order than it is presented in the cells; deletion, where the user deletes or overwrites the contents of a cell, but the interpreter retains the efect of the cell’s code; and dispersal, where the code that generates a result is spread across many distant cells
  • For the millions of users of notebooks [17], such messes are quite common: for instance, nearly half of public notebooks on GitHub include cells that were executed in a diferent order than they are listed [30]
  • We introduce a suite of interactive tools, code gathering tools, as an extension to computational notebooks
  • Our qualitative usability study with 12 professional data scientists confrmed that cleaning computational notebooks is primarily about removing unwanted analysis code and results
Methods
  • DESIGN MOTIVATIONS

    The authors conducted formative interviews with eight data analysts and builders of tools for data analysis at a large, data-driven software company.
  • Analysts expressed the most enthusiasm for tools to help them clean their results, and explore past variants of their code.
  • These conversations and a review of the related literature yielded several key ideas that guided the design of notebook cleaning tools.
  • Analysts have diverse personal preferences of whether and how to organize and manage versions of code.
  • The tools silently collect history, and provide access to the code that produced any visible result
Results
  • The authors refer to the 12 analysts from the study with the pseudonyms P1–12.

    The meaning of “cleaning”

    Before giving analysts the tutorial about code gathering tools, the authors frst asked them to describe their cleaning practice and to clean a notebook in their usual way.
  • One analysts’s description of cleaning is surprisingly close to the code gathering algorithm: “So I picked a plot that looked interesting and that’s maybe something I would want to share with someone and if you think of a dependency tree of cells, sort of walked backwards, removed everything that wasn’t necessary” (P10)
  • In their everyday work, some analysts clean by deleting unwanted cells, but most copy/paste desired cells to a fresh notebook.
  • Some analysts clean by deleting unwanted cells, but most copy/paste desired cells to a fresh notebook. (One analysts who cleans by deletion initially found the non-destructive nature of code gathering to be unintuitive, but adjusted after practice (P4).) Many described the process as error-prone and frequently re-execute the cleaned notebook to check that nothing is broken
Conclusion
  • DISCUSSION AND FUTURE

    WORK

    To help analysts manage messes in their code, the authors ofer tool builders the following suggestions: Support a broad set of notebook cleaning tasks.
  • While slicing and ordering code is a key step in cleaning notebooks, analysts still need support for many other cleaning tasks
  • This includes refactoring code, restructuring notebooks, polishing visualizations, and providing additional documentation to explain the code and results.
  • Participants fnd the primary cleaning task to be clerical and error-prone
  • They responded positively to the code gathering tools, which automatically produce the minimal code necessary to replicate a chosen set of analysis results, using a novel application of program slicing.
  • Analysts primarily used code gathering as a “fnishing move” to share work, and found unanticipated uses like generating reference material, creating lightweight branches in their code, and creating summaries for multiple audiences
Summary
  • Introduction:

    Data analysts often engage in “exploratory programming” as they write and refne code to understand unfamiliar data, test hypotheses, and build models [19].
  • A notebook’s user interface is a collection of code editors, called “cells.” At any time, the user can submit code from any cell to a hidden interpreter session
  • This design leads to three types of messes common to notebooks: disorder, where the interpreter runs code in a diferent order than it is presented in the cells; deletion, where the user deletes or overwrites the contents of a cell, but the interpreter retains the efect of the cell’s code; and dispersal, where the code that generates a result is spread across many distant cells.
  • While analysts often wish to share their fndings with others [16, 21, 33], they are often reluctant to do so until they have cleaned their code [31, 33]
  • Methods:

    DESIGN MOTIVATIONS

    The authors conducted formative interviews with eight data analysts and builders of tools for data analysis at a large, data-driven software company.
  • Analysts expressed the most enthusiasm for tools to help them clean their results, and explore past variants of their code.
  • These conversations and a review of the related literature yielded several key ideas that guided the design of notebook cleaning tools.
  • Analysts have diverse personal preferences of whether and how to organize and manage versions of code.
  • The tools silently collect history, and provide access to the code that produced any visible result
  • Results:

    The authors refer to the 12 analysts from the study with the pseudonyms P1–12.

    The meaning of “cleaning”

    Before giving analysts the tutorial about code gathering tools, the authors frst asked them to describe their cleaning practice and to clean a notebook in their usual way.
  • One analysts’s description of cleaning is surprisingly close to the code gathering algorithm: “So I picked a plot that looked interesting and that’s maybe something I would want to share with someone and if you think of a dependency tree of cells, sort of walked backwards, removed everything that wasn’t necessary” (P10)
  • In their everyday work, some analysts clean by deleting unwanted cells, but most copy/paste desired cells to a fresh notebook.
  • Some analysts clean by deleting unwanted cells, but most copy/paste desired cells to a fresh notebook. (One analysts who cleans by deletion initially found the non-destructive nature of code gathering to be unintuitive, but adjusted after practice (P4).) Many described the process as error-prone and frequently re-execute the cleaned notebook to check that nothing is broken
  • Conclusion:

    DISCUSSION AND FUTURE

    WORK

    To help analysts manage messes in their code, the authors ofer tool builders the following suggestions: Support a broad set of notebook cleaning tasks.
  • While slicing and ordering code is a key step in cleaning notebooks, analysts still need support for many other cleaning tasks
  • This includes refactoring code, restructuring notebooks, polishing visualizations, and providing additional documentation to explain the code and results.
  • Participants fnd the primary cleaning task to be clerical and error-prone
  • They responded positively to the code gathering tools, which automatically produce the minimal code necessary to replicate a chosen set of analysis results, using a novel application of program slicing.
  • Analysts primarily used code gathering as a “fnishing move” to share work, and found unanticipated uses like generating reference material, creating lightweight branches in their code, and creating summaries for multiple audiences
Funding
  • Introduces code gathering tools, extensions to computational notebooks that help analysts fnd, clean, recover, and compare versions of code in cluttered, inconsistent notebooks
  • Aims to improve the state of the art in tools for managing messes in notebooks
  • Introduces a suite of interactive tools, code gathering tools, as an extension to computational notebooks
  • Found that afordances for gathering code to a notebook were both valued and versatile, enabling analysts to clean notebooks for multiple audiences, generate personal reference material, and perform lightweight branching
Reference
  • Joel Brandt, Vignan Pattamatta, William Choi, Ben Hsieh, and Scott R. Klemmer. 2010. Rehearse: Helping Programmers Adapt Examples by Visualizing Execution and Highlighting Related Code. Technical Report. Stanford University.
    Google ScholarFindings
  • Brian Burg, Richard Bailey, Andrew J. Ko, and Michael D. Ernst. 2013. Interactive Record/Replay for Web Application Debugging. In Proceedings of the ACM Symposium on User Interface Software and Technology. ACM, 473–483.
    Google ScholarLocate open access versionFindings
  • Brian Burg, Andrew J. Ko, and Michael D. Ernst. 2015. Explaining Visual Changes in Web Interfaces. In Proceedings of the ACM Symposium on User Interface Software and Technology. ACM, 259–269.
    Google ScholarLocate open access versionFindings
  • Steven P. Callahan, Juliana Freire, Emanuele Santos, Carlos E. Scheidegger, Cláudio T. Silva, and Huy T. Vo. 2006. VisTrails: Visualization meets Data Management. In Proceedings of the ACM International Conference on Management of Data. ACM, 745–747.
    Google ScholarLocate open access versionFindings
  • Mihai Codoban, Sruti Srinivasa Ragavan, Danny Dig, and Brian Bailey. 201Software History under the Lens: A Study on Why and How Developers Examine It. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution. IEEE, 1–10.
    Google ScholarLocate open access versionFindings
  • Paul A. Gross, Micah S. Herstand, Jordana W. Hodges, and Caitlin L. Kelleher. 2010. A Code Reuse Interface for Non-Programmer Middle School Students. In Proceedings of the International Conference on Intelligent User Interfaces. ACM, 219–228.
    Google ScholarLocate open access versionFindings
  • Philip J. Guo and Margo Seltzer. 2012. BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure. In Proceedings of the USENIX Workshop on the Theory and Practice of Provenance (TaPP’12).
    Google ScholarLocate open access versionFindings
  • Björn Hartmann, Loren Yu, Abel Allison, Yeonsoo Yang, and Scott R. Klemmer. 200Design as Exploration: Creating Interface Alternatives through Parallel Authoring and Runtime Tuning. In Proceedings of the ACM Symposium on User interface Software and Technology. ACM, 91–100.
    Google ScholarLocate open access versionFindings
  • Andrew Head, Elena L Glassman, Björn Hartmann, and Marti A Hearst. 2018. Interactive Extraction of Examples from Existing Code. In Proceedings of ACM Conference on Human Factors in Computing Systems. ACM, Article 85.
    Google ScholarLocate open access versionFindings
  • Brian Hempel, Justin Lubin, Grace Lu, and Ravi Chugh. 2018. Deuce: A Lightweight User Interface for Structured Editing. In Proceedings of the ACM/IEEE International Conference on Software Engineering. ACM/IEEE, 654–664.
    Google ScholarLocate open access versionFindings
  • Joshua Hibschman and Haoqi Zhang. 2015. Unravel: Rapid Web Application Reverse Engineering via Interaction recording, Source Tracing, and Library Detection. In Proceedings of the ACM Symposium on User Interface Software and Technology. ACM, 270–279.
    Google ScholarLocate open access versionFindings
  • Joshua Hibschman and Haoqi Zhang. 2016. Telescope: Fine-Tuned Discovery of Interactive Web UI Feature Implementation. In Proceedings of the ACM Symposium on User Interface Software and Technology. ACM, 233–245.
    Google ScholarLocate open access versionFindings
  • Reid Holmes and Robert J. Walker. 2012. Systematizing Pragmatic Software Reuse. ACM Transactions on Software Engineering and Methodology 21, 4, Article 20 (2012).
    Google ScholarLocate open access versionFindings
  • Jison. http://jison.org
    Findings
  • Jupyter. http://jupyter.org/
    Findings
  • Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jefrey Heer.
    Google ScholarFindings
  • 2012. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2917–2926.
    Google ScholarLocate open access versionFindings
  • [17] Kyle Kelley and Brian Granger. 2017. Jupyter Frontends: From the Classic Jupyter Notebook to JupyterLab, nteract, and Beyond. Video. In JupyterCon. https://www.youtube.com/watch?v=YKmJvHjTGAM
    Findings
  • [18] Mary Beth Kery, Amber Horvath, and Brad Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists. In Proceedings of ACM Conference on Human Factors in Computing Systems. 1265–1276.
    Google ScholarLocate open access versionFindings
  • [19] Mary Beth Kery and Brad A. Myers. 2017. Exploring Exploratory Programming. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, 25–29.
    Google ScholarLocate open access versionFindings
  • [20] Mary Beth Kery and Brad A. Myers. 2018. Interactions for Untangling Messy History in a Computational Notebook. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, 147–155.
    Google ScholarLocate open access versionFindings
  • [21] Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E. John, and Brad A. Myers. 2018. The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool. In Proceedings of the ACM Conference on Human Factors in Computing Systems. ACM, Article 174.
    Google ScholarLocate open access versionFindings
  • [22] Andrew Ko and Brad A. Myers. 2009. Finding Causes of Program Output with the Java Whyline. In Proceedings of the ACM Conference on Human Factors in Computing Systems. ACM, 1569–1578.
    Google ScholarLocate open access versionFindings
  • [23] Yun Young Lee, Nicholas Chen, and Ralph E. Johnson. 2013. Drag-andDrop Refactoring: Intuitive and Efcient Program Transformation. In Proceedings of the IEEE International Conference on Software Engineering. IEEE, 23–32.
    Google ScholarLocate open access versionFindings
  • [24] Yun Young Lee, Darko Marinov, and Ralph E. Johnson. 2015. Tempura: Temporal Dimension for IDEs. In Proceedings of the IEEE/ACM International Conference on Software Engineering, Vol. 1. IEEE/ACM, 212–222.
    Google ScholarLocate open access versionFindings
  • [25] Josip Maras, Maja Stula, Jan Carlson, and Ivica Crnković. 2013. Identifying Code of Individual Features in Client-Side Web Applications. IEEE Transactions on Software Engineering 39, 12 (2013), 1680–1697.
    Google ScholarLocate open access versionFindings
  • [26] Emerson Murphy-Hill and Andrew P Black. 2008. Refactoring Tools: Fitness for Purpose. IEEE Software 25, 5 (2008).
    Google ScholarLocate open access versionFindings
  • [27] Christopher Oezbek and Lutz Prechelt. 2007. JTourBus: Simplifying Program Understanding by Documentation that Provides Tours Through the Source Code. In Proceedings of the IEEE International Conference on Software Maintenance. IEEE, 64–73.
    Google ScholarLocate open access versionFindings
  • [28] Stephen Oney and Brad Myers. 2009. FireCrystal: Understanding Interactive Behaviors in Dynamic Web Pages. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, 105–108.
    Google ScholarLocate open access versionFindings
  • [29] João Felipe Pimentel, Juliana Freire, Leonardo Murta, and Vanessa Braganholo. 2016. Fine-Grained Provenance Collection over Scripts Through Program Slicing. In Proceedings of the International Provenance and Annotation Workshop. Springer, 199–203.
    Google ScholarLocate open access versionFindings
  • [30] Adam Rule. 2018. Design and Use of Computational Notebooks. Ph.D. Dissertation. University of California San Diego.
    Google ScholarFindings
  • [31] Adam Rule, Ian Drosos, Aurélien Tabard, and James D. Hollan. 2018. Aiding Collaborative Reuse of Computational Notebooks with Annotated Cell Folding. In Proceedings of the ACM Conference on ComputerSupported Cooperative Work and Social Computing. ACM, Article 150.
    Google ScholarLocate open access versionFindings
  • [32] Adam Rule, Aurélien Tabard, and James D. Hollan. Data from: Exploration and Explanation in Computational Notebooks. https://doi.org/10.6075/J0JW8C39.
    Findings
  • [33] Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. In Proceedings of the ACM Conference on Human Factors in Computing Systems. ACM, Article 32.
    Google ScholarLocate open access versionFindings
  • [34] Francisco Servant and James A. Jones. 2012. History Slicing: Assisting Code-Evolution Tasks. In Proceedings of the ACM International Symposium on the Foundations of Software Engineering. ACM, Article 43.
    Google ScholarLocate open access versionFindings
  • [35] Sruti Srinivasa Ragavan, Sandeep Kaur Kuttal, Charles Hill, Anita Sarma, David Piorkowski, and Margaret Burnett. 2016. Foraging among an Overabundance of Similar Variants. In Proceedings of the ACM
    Google ScholarLocate open access versionFindings
  • [36] Ryo Suzuki. 2015. Interactive and Collaborative Source Code Annotation. In Proceedings of the IEEE/ACM International Conference on Software Engineering, Vol. 2. IEEE, 799–800.
    Google ScholarLocate open access versionFindings
  • [37] Unofcial Jupyter Notebook Extensions. https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/.
    Findings
  • [38] Mark Weiser. 1981. Program slicing. In Proceedings of the International Conference on Software Engineering. IEEE, 439–449.
    Google ScholarLocate open access versionFindings
  • [39] YoungSeok Yoon and Brad A. Myers. 2012. An Exploratory Study of Backtracking Strategies Used by Developers. In Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering. IEEE, 138–144.
    Google ScholarLocate open access versionFindings
  • [40] YoungSeok Yoon and Brad A. Myers. 2015. Supporting Selective Undo in a Code Editor. In Proceedings of the IEEE/ACM International Conference on Software Engineering, Vol. 1. IEEE/ACM, 223–233.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Best Paper
Best Paper of CHI, 2019
Tags
Comments