HARK No More: On the Preregistration of CHI Experiments

CHI, 2018.

Cited by: 24|Bibtex|Views36|Links
EI
Keywords:
computer interactionH.5.mexperimental preregistrationnullhypothesis significance testingempirical researchMore(7+)
Weibo:
Many scientific disciplines have concluded that experimental preregistration is an essential part of their discipline

Abstract:

Experimental preregistration is required for publication in many scientific disciplines and venues. When experimental intentions are preregistered, reviewers and readers can be confident that experimental evidence in support of reported hypotheses is not the result of HARKing, which stands for Hypothesising After the Results are Known. We...More

Code:

Data:

Introduction
  • Researchers in HCI use a wide variety of evaluation methods, including qualitative and quantitative techniques, in field and laboratory settings, and taking objective and subjective measurements or observations.
  • Brief Summary of Inferential Statistics and NHST One of the main reasons for conducting statistical analyses in HCI is to draw inferences about how a population of users would interact with a particular form of user interface, compared to the existing state of the art.
  • Such analyses ask whether the population of potential users would be faster, more accurate, or prefer the new interface if it were broadly disseminated.
  • External validity means that experimental results are generalisable beyond tested settings – for example, if externally valid, a lab study showing 15% performance improvements with a new mouse would mean 15% improvements when using the same mouse in practice
Highlights
  • Researchers in HCI use a wide variety of evaluation methods, including qualitative and quantitative techniques, in field and laboratory settings, and taking objective and subjective measurements or observations
  • Null-hypothesis significance testing has been a key component of the scientific method for many decades
  • While the majority of the review concerns nullhypothesis significance testing, this emphasis stems from its prevalence in HCI research; many of the issues raised are not limited to nullhypothesis significance testing
  • Many scientific disciplines have concluded that experimental preregistration is an essential part of their discipline
  • The nature of HCI empirical research raises some challenges for preregistration, its strong and appropriate reliance on iterative exploratory design and evaluation
  • Regardless of exploratory studies, much of the research knowledge within HCI is derived from formal experiments that make use of nullhypothesis significance testing, and for these studies our discipline is every bit as susceptible to problems of absent preregistration as any other; the potential benefits are equivalent too
Conclusion
  • Infrastructure: What can ACM do? Several disciplines within Computer Science and Information Systems make extensive use of experimental methods, with HCI being among the largest.
  • Modifying the digital library to host an experimental registry would be a substantial undertaking, but one which would serve the goal of ‘aggressively developing the highest-quality content’.Many scientific disciplines have concluded that experimental preregistration is an essential part of their discipline
  • It overcomes or reduces many substantial problems that are demonstrable in their literature, including HARKing, publication bias, and the file drawer effect.
  • HCI research can sharpen its methods and rigour by introducing experimental preregistration
Summary
  • Introduction:

    Researchers in HCI use a wide variety of evaluation methods, including qualitative and quantitative techniques, in field and laboratory settings, and taking objective and subjective measurements or observations.
  • Brief Summary of Inferential Statistics and NHST One of the main reasons for conducting statistical analyses in HCI is to draw inferences about how a population of users would interact with a particular form of user interface, compared to the existing state of the art.
  • Such analyses ask whether the population of potential users would be faster, more accurate, or prefer the new interface if it were broadly disseminated.
  • External validity means that experimental results are generalisable beyond tested settings – for example, if externally valid, a lab study showing 15% performance improvements with a new mouse would mean 15% improvements when using the same mouse in practice
  • Conclusion:

    Infrastructure: What can ACM do? Several disciplines within Computer Science and Information Systems make extensive use of experimental methods, with HCI being among the largest.
  • Modifying the digital library to host an experimental registry would be a substantial undertaking, but one which would serve the goal of ‘aggressively developing the highest-quality content’.Many scientific disciplines have concluded that experimental preregistration is an essential part of their discipline
  • It overcomes or reduces many substantial problems that are demonstrable in their literature, including HARKing, publication bias, and the file drawer effect.
  • HCI research can sharpen its methods and rigour by introducing experimental preregistration
Funding
  • Reviews the motivation and outcomes of experimental preregistration across a variety of disciplines, as well as previous work commenting on the role of evaluation in HCI research
  • While the range of NHST criticisms is broad, this paper primarily focuses on issues arising from publication bias – the tendency for papers that reject the null hypothesis to be accepted at a much higher rate than those that do not
  • Potential benefits and costs associated with the use of preregistration within HCI
  • Reviews literature that has raised issues relating to the conduct of experiments and data analysis within HCI
Reference
  • APA. 2010. Publication Manual of the American Psychological Association (6th ed.). American Psychological Association.
    Google ScholarFindings
  • Nikola Banovic. 2016. To Replicate or Not to Replicate? GetMobile: Mobile Comp. and Comm. 19, 4 (March 2016), 23–27. DOI: http://dx.doi.org/10.1145/2904337.2904346
    Locate open access versionFindings
  • 4. Mario Biagioli. 2016. Watch out for cheats in citation game. Nature 535, 7611 (Jul 14 2016), 201. DOI: http://dx.doi.org/10.1038/535201a
    Locate open access versionFindings
  • 5. J. P. Boissel. 1993. International Collaborative Group on Clinical Trial Registries: Position paper and consensus recommendations on clinical trial registries. Clinical Trials and Meta-Analysis 28, 4-5 (1993), 255–266.
    Google ScholarLocate open access versionFindings
  • 6. Paul Cairns. 2007. HCI... Not As It Should Be: Inferential Statistics in HCI Research. In Proceedings of the 21st British HCI Group Annual Conference on People and Computers: HCI...But Not As We Know It - Volume 1 (BCS-HCI ’07). British Computer Society, Swinton, UK, UK, 195–201. http://dl.acm.org/citation.cfm?id=1531294.1531321
    Locate open access versionFindings
  • 7. Lucas C. Coffman and Muriel Niederle. 2015. Pre-analysis Plans Have Limited Upside, Especially Where Replications Are Feasible. Journal of Economic Perspectives 29, 3 (September 2015), 81–98. DOI: http://dx.doi.org/10.1257/jep.29.3.81
    Locate open access versionFindings
  • 8. Jacob Cohen. 1990. Things I have learned (so far). American Psychologist 45, 12 (1990), 1304 – 1312.
    Google ScholarLocate open access versionFindings
  • 9. G. Cumming. 2012. Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-analysis. Routledge.
    Google ScholarFindings
  • 10. Rennie D. 2004. Trial registration: A great idea switches from ignored to irresistible. JAMA 292, 11 (2004), 1359–1362. DOI: http://dx.doi.org/10.1001/jama.292.11.1359
    Locate open access versionFindings
  • 11. K. Dickersin, S. Chan, T. C. Chalmers, H. S. Sacks, and H. Smith Jr. 1987. Publication bias and clinical trials. Controlled Clinical Trials 8, 4 (1987), 343–353.
    Google ScholarLocate open access versionFindings
  • 12. Pierre Dragicevic. 2016. Fair Statistical Communication in HCI. In Modern Statistical Methods for HCI, Judy Robertson and Maurits Kaptein (Eds.). Springer International Publishing, Cham, 291–330. DOI: http://dx.doi.org/10.1007/978-3-319-26633-6_13
    Findings
  • 13. Wolfgang Forstmeier, Eric-Jan Wagenmakers, and Timothy H. Parker. 2016. Detecting and avoiding likely false-positive findings – a practical guide. Biological Reviews (2016), n/a–n/a. DOI: http://dx.doi.org/10.1111/brv.12315
    Locate open access versionFindings
  • 14. Annie Franco, Neil Malhotra, and Gabor Simonovits. 2014. Publication bias in the social sciences: Unlocking the file drawer. Science 345, 6203 (2014), 1502–1505. DOI:http://dx.doi.org/10.1126/science.1255484
    Locate open access versionFindings
  • 15. C. A. E. Goodhart. 1984. Problems of Monetary Management: The UK Experience. Macmillan Education UK, London, 91–121. DOI: http://dx.doi.org/10.1007/978-1-349-17295-5_4
    Findings
  • 16. Saul Greenberg and Bill Buxton. 2008. Usability evaluation considered harmful (some of the time). In CHI ’08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems. ACM, New York, NY, USA, 111–120. DOI: http://dx.doi.org/10.1145/1357054.1357074
    Locate open access versionFindings
  • 17. Kasper Hornbæk, Søren S. Sander, Javier Andrés Bargas-Avila, and Jakob Grue Simonsen. 2014. Is Once Enough?: On the Extent and Content of Replications in Human-computer Interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). ACM, New York, NY, USA, 3523–3532. DOI: http://dx.doi.org/10.1145/2556288.2557004
    Locate open access versionFindings
  • 18. George S. Howard, Scott E. Maxwell, and Kevin J. Fleming. 2000. The proof of the pudding: An illustration of the relative strengths of null hypothesis, meta-analysis, and Bayesian analysis. Psychological Methods 5, 3 (2000), 315 – 332. http://dx.doi.org/10.1037/1082-989X.5.3.315
    Locate open access versionFindings
  • 19. Andrew Howes, Benjamin R. Cowan, Christian P. Janssen, Anna L. Cox, Paul Cairns, Anthony J. Hornof, Stephen J. Payne, and Peter Pirolli. 2014. Interaction Science SIG: Overcoming Challenges. In CHI ’14 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’14). ACM, New York, NY, USA, 1127–1130. DOI: http://dx.doi.org/10.1145/2559206.2559208
    Locate open access versionFindings
  • 20. Macartan Humphreys, Raul Sanchez de la Sierra, and Peter van der Windt. 2013. Fishing, Commitment, and Communication: A Proposal for Comprehensive Nonbinding Research Registration. Political Analysis 21, 1 (2013), 1. DOI:http://dx.doi.org/10.1093/pan/mps021
    Locate open access versionFindings
  • 21. John P. A. Ioannidis. 2005. Why Most Published Research Findings Are False. PLOS Medicine 2, 8 (08 2005). DOI: http://dx.doi.org/10.1371/journal.pmed.0020124
    Locate open access versionFindings
  • 22. Leslie K. John, George Loewenstein, and Drazen Prelec. 2012. Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling. Psychological Science 23, 5 (2012), 524–532. DOI: http://dx.doi.org/10.1177/0956797611430953 PMID:22508865.
    Locate open access versionFindings
  • 23. Valen E. Johnson. 2013. Revised standards for statistical evidence. Proceedings of the National Academy of Sciences 110, 48 (2013), 19313–19317. DOI: http://dx.doi.org/10.1073/pnas.1313476110
    Locate open access versionFindings
  • 24. Dickersin K and Rennie D. 2012. The evolution of trial registries and their use to assess the clinical trial enterprise. JAMA 307, 17 (2012), 1861–1864. DOI: http://dx.doi.org/10.1001/jama.2012.4230
    Locate open access versionFindings
  • 25. Robert M. Kaplan and Veronica L. Irvin. 2015. Likelihood of Null Effects of Large NHLBI Clinical Trials Has Increased over Time. PLOS ONE 10, 8 (08 2015), 1–12. DOI: http://dx.doi.org/10.1371/journal.pone.0132382
    Locate open access versionFindings
  • 26. Matthew Kay, Steve Haroz, Shion Guha, and Pierre Dragicevic. 2016a. Special Interest Group on Transparent Statistics in HCI. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA ’16). ACM, New York, NY, USA, 1081–1084. DOI: http://dx.doi.org/10.1145/2851581.2886442
    Locate open access versionFindings
  • 27. Matthew Kay, Gregory L. Nelson, and Eric B. Hekler. 2016b. Researcher-Centered Design of Statistics: Why Bayesian Statistics Better Fit the Culture and Incentives of HCI. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI ’16). ACM, New York, NY, USA, 4521–4532. DOI: http://dx.doi.org/10.1145/2858036.2858465
    Locate open access versionFindings
  • 28. Norbert L. Kerr. 1998. HARKing: Hypothesizing After the Results are Known. Personality & Social Psychology Review (Lawrence Erlbaum Associates) 2, 3 (1998), 196.
    Google ScholarLocate open access versionFindings
  • 29. Don Lewis and C. J. Burke. 1949. The use and misuse of the chi-square test. Psychological Bulletin 46, 6 (1949), 433 – 489. https://www.ncbi.nlm.nih.gov/pubmed/15392587
    Locate open access versionFindings
  • 30. H Lieberman. 2002. The Tyranny of Evaluation. http://web.media.mit.edu/~lieber/Misc/
    Findings
  • Tyranny-Evaluation.html. (2002). Last accessed: June 21, 2017.
    Google ScholarFindings
  • Wendy E. Mackay, Caroline Appert, Michel Beaudouin-Lafon, Olivier Chapuis, Yangzhou Du, Jean-Daniel Fekete, and Yves Guiard. 2007. Touchstone: Exploratory Design of Experiments. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’07). ACM, New York, NY, USA, 1425–1434. DOI: http://dx.doi.org/10.1145/1240624.1240840
    Locate open access versionFindings
  • Michael E. J. Masson. 2011. A tutorial on a practical Bayesian alternative to null-hypothesis significance testing. Behavior Research Methods 43, 3 (2011), 679–690. DOI: http://dx.doi.org/10.3758/s13428-010-0049-5
    Locate open access versionFindings
  • Bertrand Meyer. 2012. Incremental Research vs. Paradigm-shift Mania. Commun. ACM 55, 9 (Sept. 2012), 8–9. DOI:http://dx.doi.org/10.1145/2330667.2330670
    Locate open access versionFindings
  • James E. Monogan, III. 2013. A Case for Registering Studies of Political Outcomes: An Application in the 2010 House Elections. Political Analysis 21, 1 (2013), 21. DOI:http://dx.doi.org/10.1093/pan/mps022
    Locate open access versionFindings
  • Robert Rosenthal. 1979. The file drawer problem and tolerance for null results. Psychological Bulletin 86, 3 (1979), 638 – 641. http://dx.doi.org/10.1037/0033-2909.86.3.638
    Locate open access versionFindings
  • J. D. Scargle. 1999. Publication Bias (The “File-Drawer Problem”) in Scientific Inference. ArXiv Physics e-prints (Sept. 1999). http://adsabs.harvard.edu/abs/1999physics...9033S
    Locate open access versionFindings
  • Theodore D. Sterling. 1959. Publication Decisions and their Possible Effects on Inferences Drawn from Tests of Significance — or Vice Versa. J. Amer. Statist. Assoc. 54, 285 (1959), 30–34. DOI: http://dx.doi.org/10.1080/01621459.1959.10501497
    Locate open access versionFindings
  • David Trafimow and Michael Marks. 2015. Editorial. Basic and Applied Social Psychology 37, 1 (2015), 1–2. DOI:http://dx.doi.org/10.1080/01973533.2015.1012991
    Locate open access versionFindings
  • Max L. L. Wilson, Paul Resnick, David Coyle, and Ed H. Chi. 2013. RepliCHI: The Workshop. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’13). ACM, New York, NY, USA, 3159–3162. DOI: http://dx.doi.org/10.1145/2468356.2479636
    Locate open access versionFindings
  • S Zhai. 2002. Evaluation is the worst form of HCI research except all those other forms that have been tried. www.shuminzhai.com/papers/EvaluationDemocracy.htm. (2002). Last accessed: June 21, 2017.
    Locate open access versionFindings
Your rating :
0

 

Best Paper
Best Paper of CHI, 2018
Tags
Comments