Attitudes toward amalgamating evidence in statistics

semanticscholar(2016)

引用 0|浏览0
暂无评分
摘要
Weighing and amalgamating evidence is a central problem in statistics, giving rise to much debate on what methods are appropriate as well as exactly where, when and for what purposes they should be used. On the other hand, the weighing and amalgamating of evidence within a single isolated study (the multiple observations) in many default approaches in statistics is surprisingly often just automatic and implicit. Now vigorous debate on basic approaches in statistics likely comes as no surprise to statisticians and increasingly almost everyone else. Although there is much agreement on mathematical definitions of terms and procedures in statistics (what they are), as well as the discerning if particular instances meet these (is it this?), when it comes to the appropriate roles for these terms and procedures in facilitating scientific inquiry their very purposes and what to make of them it seems beyond agreement for the foreseeable future. The tools are largely agreed upon, their appropriate use, where and for what purposes, not at all. For instance, there is a fair amount of agreement on what probabilities are, but not on what they can be used for. Many frequentests ban any use of probability in representing (uncertain) knowledge of unknown parameters. On the other hand, although almost all Bayesians would use probabilities to represent knowledge (or lack of it), some Bayesians would ban any testing or empirically based assessment of these. In the case of a single study, some statisticians would be concerned about properties of procedures that can be discerned if the procedure would be repeatedly applied infinitely often under similar kinds of studies or even exactly the same study conditions. Others argue this is not even sensible. Going beyond a single isolated study, the system of scientific publication, criticism, and meta-analysis provides more general avenues for amalgamation of evidence between rather than just within a study and here individual statistical analyses can be understood as (first) steps in this larger process. Perhaps unsurprisingly in this larger process there are more disagreements as opinions vary on what contextual (extra study) information can be plugged in, where and how. Should previous studies be amalgamated in a combined analysis, or just used to build a judgement informed prior or just used qualitatively to refine what analysis should done to make the study stand on its own as much as possible? In this article we lay out a general prospective on statistics as primarily about conjecturing, assessing, and adopting idealized representations of reality, predominantly using probability generating models for both parameters and data. That is, an explicit prior probability distribution to represent available but rough scientific jugements of what values the unknown parameters might have been set to and a data generating probability distribution to represent how the recorded data likely came about if the unknown parameters’ values were set to specific possible values. This contrasts with another prospective on statistics, as primarily being about discerning procedures with good properties that are uniform over a wide range of possible underlying realities and restricting use, especially in science, to just those procedures. Our perspective is perhaps more inviting of information aggregation, as reality likely has many commonalities that can be discerned and profitably gained from. We believe it is a perspective which can unify seemingly distinct statistical philosophies as well as provide some guidance to resolving the current replication crisis in science as when claims fail to replicate the methods used likely did not reflect reality well, if at all. ∗We thank the Office of Naval Research for partial support of this work. †Department of Statistics and Department of Political Science, Columbia University, New York. ‡O’Rourke Consulting, Ottawa, Ontario. 1. Statistics as amalgamation of evidence One of the frustrating—and fascinating—aspects of statistics, compared to many other modern sciences, is its profusion of seemingly incompatible philosophies. The Neyman-Pearson approach is centered around defining procedures for discriminating between hypotheses, targeting uniform type one error for all nulls and uniformly minimum type two errors for all alternatives. The Fisherian p-value, in contrast, evaluates the strength of evidence against a single null hypothesis without explicit reference to any alternative, targeting a Uniform(0,1) distribution of p-values for all nulls. Another Fisherian approach, maximum likelihood, provides estimates within a parametric model, targeting asymptotic Normality of the maximum likelihood for all likelihoods. Bayesian inference can be viewed as a generalization of maximum likelihood but is anathema to many because of its assignment of probability distributions to parameters that are not the product of random processes. It targets probability distributions that represent current understanding of the realities and uncertainties involved. Nonparametric approaches such as bootstrap and lasso have traditionally been shoehorned into the frameworks of hypothesis testing and interval estimation, but in recent years the machine learning approach has focused not on those classical problems but rather on pure prediction. They target lessening of assumptions (used to represent current understanding of the realities and uncertainties) involved and more identification of procedures with seemingly good properties. The decision of what information is to be combined is often dictated by probability models or inferential algorithms that themselves are chosen largely by convention. This occurs for basic users who are taught to use t-tests for continuous data (group variances assumed to be common to give a combined variance estimate with more degrees of freedom), χ2 tests for discrete data (various choices of common parameters to assume in defining the expectations to test consistency with), linear regression models (assuming all observations have common slopes given the explanatory variables fit as well as common standard deviation), Cox models for survival data (common proportional hazard function assumed so that it cancels out), etc., but even experienced statisticians often do not seem to be clear as to where the choices are made of which information to combine in their data analysis. Even amid the diversity of statistical methods and philosophies, though, all these approaches involve the amalgamation of evidence. This goes for the simplest models of random sampling and independent identically distributed data; to slightly more elaborate models with hierarchical, timeseries, and spatial structure; to elaborate multistage deep learning algorithms combining thousands of predictors or features. Even something as basic as Fisherian p-values or likelihood-ratio testing can be seen as a way to use the accumulation of data—that is, the piling-up of evidence—to draw increasingly certain conclusions. It has been said that the most important aspect of a statistical method is not what it does with the data but rather what data it uses (Gelman, 2015). From that perspective, the power of Bayesian, regularization, and machine-learning methods is that they can incorporate large amounts of data into analysis and decision making. At the same time, as datasets become larger and more diverse, there is an increasing need to model and adjust for differences between sample (that is, available data) and population, and between treatment and control groups in causal analysis. Amalgamation of evidence is important but it is not trivial; it is not just a matter of throwing data into a blender. One must evaluate data quality to decide what to include. Or, more generally, one must weight and adjust data in light of what is known about the quality and representativeness of measurements and in light of the consistency of different data sources with available research hypotheses. Implicitly these procedures can be seen as deriving from different probabilistic data-generating models and prior distributions,
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要