Null model analyses are not adequate to summarize strong associations: Rebuttal to Ulrich et al. (2022)

JOURNAL OF BIOGEOGRAPHY(2024)

引用 0|浏览0
暂无评分
摘要
We recently developed a novel metric of association in pairwise co-occurrence data (Mainali et al., 2022) to address fundamental flaws in traditional indices, as elaborately discussed and conclusively shown in our published paper. Our new metric, the maximum likelihood estimator (MLE) alpha-hat of a statistical parameter alpha, quantifies the degree of association between species occupancy at ecological sites, and it is insensitive to the species prevalences and number of sites. In contrast, we showed that classic indices of co-occurrence (Jaccard, Simpson, Sørensen–Dice) can be highly sensitive to fixed margins of 2 × 2 $$ 2\times 2 $$ contingency tables, estimating wildly variable degrees of association and even reversing the direction of association for tables with different margins but the same degree-of-association alpha. Ulrich et al. (2022), hereafter USSG, adversely commented on our paper, claiming that our metric lacked novelty and insights beyond null-hypothesis standardization techniques. In this commentary, we address each of USSG's specific claims reflecting their view that test statistics for null association adequately summarize strongly non-null association in co-occurrence data. We show that standardized co-occurrence behaves differently in different datasets with the same strongly non-null degree of association, while alpha-hat exhibits reliable performance. If the counts of two species are m A $$ {m}_A $$ and m B $$ {m}_B $$ , respectively, out of N $$ N $$ total sites, then the hypergeometric distribution for their co-occurrence count X $$ X $$ underlies Fisher's famous ‘exact test’ (Fisher, 1934) to analyse a contingency table (Mainali et al., 2022) for the possible dependence of its row and column categories. Recently, the hypergeometric was introduced in ecology (Griffith et al., 2016; Veech, 2013) as a null distribution for co-occurrence analysis. The FE null model of Gotelli (2000) specifies the same distribution, expressed in terms of stochastic simulations rather than mathematical formulas. While the null model enables the hypothesis test of no association, much of the scientific interest in co-occurrence analysis in Ecology and Biogeography concerns the degree of association in non-null settings. Null-hypothesis test statistics do not reliably capture the degree of association under diverse fixed table margins. A statistical analogy crystallizes our view of this problem. Suppose an investigator wants to summarize the results of many coin-toss experiments, in each of which one fair or biased coin is tossed repeatedly. Each such experiment records the number n $$ n $$ of tosses and the number X $$ X $$ of heads. The coin used in each experiment is different, some coins being fair and others biased in different ways. Motivated by the standard hypothesis test that the coin is fair, one could ‘summarize’ these experiments by the standardized count Z = X − n / 2 / √ n / 4 $$ Z=\left(X-n/2\right)/\surd \left(n/4\right) $$ (or Binomial( n , ½ $$ n,\mathit{\frac{1}{2}} $$ ) probability of X $$ X $$ or more heads) used to test the fair-coin hypothesis. But is there any researcher—living in a world in which many coins are biased positively and many negatively—who would not instead parameterize the unknown heads probability p $$ p $$ in each experiment and estimate it by p ̂ = X / n $$ \hat{p}=X/n $$ (or some similar Bayesian estimator) along with a confidence interval or estimate of variability of p ̂ $$ \hat{p} $$ ? The estimator has a target and meaning regardless of n $$ n $$ , but Z $$ Z $$ does not when p $$ p $$ differs from ½ $$ \mathit{\frac{1}{2}} $$ . The situation is very similar for co-occurrences. Statistical methods estimating a target parameter reliably group like degrees of association regardless of differing 2 × 2 $$ 2\times 2 $$ table margins. USSG claim our metric suffers from three ‘problems’: (1) for fixed 2 × 2 $$ 2\times 2 $$ table margins, our alpha-hat is essentially equivalent to null-hypothesis standardized counts or P-values; (2) our affinity model shares with the null model the assumption that all N $$ N $$ sites are equally likely to be occupied (separately) by each species once the prevalences m A $$ {m}_A $$ , m B $$ {m}_B $$ are fixed; and (3) the alpha-hat metric is too complicated and numerically unstable as implemented in our R code (now a CRAN package ‘CooccurrenceAffinity’). By far, the largest part of USSG's commentary focusses on their comment (1), interspersed with reanalyses of real and artificial datasets comparing null-standardized statistics with alpha-hat. We address all their comments sequentially (as presented in USSG), showing that USSG's data analyses reinforce the merits of alpha-hat rather than of null-standardized indices and provide exhibits showing the centring at alpha and stability of the distribution of alpha-hat across a variety of margins ( m A $$ {m}_A $$ , m B , N $$ {m}_B,N $$ ). In their first comment, USSG assume the common (hypergeometric) form of null model. We (Mainali et al., 2022) provided four ‘classic assumptions’ that are precisely stated and imply the hypergeometric null hypothesis under fixed marginals for a co-occurrence table. We did not examine the equivalence of our assumptions with the less precisely stated assumptions of the prior studies, including Gotelli (2000) and Wright et al. (1998). But our disagreements with USSG all relate to non-null degrees of association. USSG's first comment analysed a previously published dataset (Wright et al., 1998). We reproduced the example in which USSG argues that our ‘affinity and Veech (2013)'s probabilistic occurrence yield very similar results on a large set of empirical species pairs', as seen in the blue cluster of Figure 1a. This observation is correct for Veech's probability (pv, the probability of the observed or higher co-occurrences) values ranging approximately from ~0.05 to ~0.95, that is the nonsignificant ones (roughly corresponding with standardized co-occurrence from −1.65 to +1.65). However, the most interesting associations lie outside these ranges, revealing the incompleteness of USSG's view and the novelty of affinity. When co-occurrence counts fall in the statistically significant range, it is not sufficiently informative simply to declare them significant; instead, we should estimate a quantity measuring the strength of associations. For this purpose, affinity serves as a more reliable tool than null-hypothesis tail probabilities or standardized co-occurrence count Z $$ Z $$ . The correspondence between pv and affinity among the blue points in Figure 1a is best expressed by removing the curvature and compression of pv values in the tails through the transformation Φ − 1 $$ {\Phi}^{-1} $$ (pv) = qnorm(pv) (the standard-normal quantile function), upon which the blue points exhibit a linear decrease with slope approximately −1.16 (Figure S1a). In all moderate-to-large 2 × 2 $$ 2\times 2 $$ tables, Z $$ Z $$ and qnorm(pv) are approximately equivalent (Figure S1f), illustrating a general principle of approximate normality of Z $$ Z $$ in large 2 × 2 $$ 2\times 2 $$ tables, as explained in the caption of Figure S1f. Therefore, throughout the paper we argue interchangeably in terms of qnorm(pv) and Z $$ Z $$ . USSG indicated that a separate cluster (red points in Figure 1a) ‘mostly stem from fully nested species pairs, where the occurrences of the less abundant species are a proper subsample of those of the more abundant species’. Indeed, we confirmed that every point of this cluster comes from a fully nested species pair. This behaviour has been discussed in mathematical detail under the ‘ML estimation of α $$ \alpha $$ ’ section of ‘Materials and Methods’ in Mainali et al. (2022). The key point is that whenever the co-occurrence count equals its largest or smallest logically possible value, alpha-hat is positively or negatively infinite (observed in 83% of species pairs in this dataset; pie chart in Figure 1a). In such situations, our software caps MLE to ± log 2 N 2 $$ \pm \log \left(2{N}^2\right) $$ , with mathematical justification given at Equation 8 in Mainali et al. (2022); see red and purple points in Figure 1a. This is a small sample phenomenon requiring care in reporting the results because the (positive or negative) strengths of association compatible with the data are unboundedly large. The truncated affinity of species pairs with infinite affinity plotted against the number of sites, as shown in USSG's Figure 1a inset and our Figure 1b, reveals a deterministic logarithmic relation, an artefact of the way our software truncates positively infinite affinity. It is a feature of log odds ratios and not our software that requires nested species counts (a co-occurrence count coinciding with one of the species prevalence counts) to be treated as a special case. The recommended way to report nested counts is with the lower endpoint of a one-sided 95% confidence interval when the co-occurrence count is at its highest extreme and the upper endpoint when the co-occurrence count is at its lowest extreme (Figure 1c shows only lower confidence interval endpoints for nested species counts). USSG created an artificial ‘compartmented matrix’ of 22 species variously occupying 50 sites, thereby generating 231 species pairs. Since USSG provided no information about how the matrix entries were generated, we fixed the N = 50 sites and the counts of occupied sites by each of the 22 species and then created four co-occurrence counts for each pair of species. For each species pair and triple m A m B N $$ \left({m}_A,{m}_B,N\right) $$ of margins for species and total site counts in the compartmented matrix, we took as 4 observed co-occurrence counts the median of the extended hypergeometric distribution (Harkness, 1965) with alpha, respectively fixed at log(2), log(2.75), log(3.5) and log(4.25) (hereafter ‘ α $$ \alpha $$ -specific co-occurrences’). Extended hypergeometric is the exact distribution of X $$ X $$ under the affinity model described in Mainali et al. (2022); the values log(2) = 0.69, log(2.75) = 1.01, log(3.5) = 1.25 and log(4.25) = 1.45 represent a range of moderate-to-strong positive associations among species site occurrences. For these pooled data, the relationship between affinity and qnorm(pv) (Figure 2a) is roughly linear, corresponding to a curvilinear affinity vs. pv relationship (Figure S1b), but among single-colour species pairs with shared value of alpha, the relationship is diffuse, especially in the purple and orange colour groups with larger true alpha. The relationship between affinity and Z $$ Z $$ or standardized Jaccard index is much the same (Figure S1c,d). So, we see that affinity and Z $$ Z $$ are not ‘equivalent’. We demonstrate the superiority of affinity over standardization by evaluating the probability mass function of both indices in settings with known nonzero alpha. We generated data with alpha fixed at 1.5 and 3 for each of 4 ( m A $$ \Big({m}_A $$ , m B , N ) $$ {m}_B,N\Big) $$ combinations. In all examples, the true probability masses P X = k $$ P\left(X=k\right) $$ of each possible co-occurrence X $$ X $$ are known. When the corresponding alpha-hat is plotted against probability mass, we observe a behaviour expected for a reliable estimate irrespective of ( m A $$ {m}_A $$ , m B , N $$ {m}_B,N $$ ): centring at the respective true alpha (Figure 2d). However, the corresponding standardized co-occurrence count exhibited a complete lack of centring (Figure 2c). This behaviour remains the same for negative associations (Figure S2). The standardized X $$ X $$ values depend sensitively on absolute and relative magnitudes of ( m A $$ {m}_A $$ , m B , N $$ {m}_B,N $$ ) and could mislead investigators into thinking that the degrees of association between species A and B occurrences differed across the four combinations of ( m A $$ {m}_A $$ , m B , N $$ {m}_B,N $$ ) with same underlying association. In conclusion, alpha-hat estimates a true degree-of-association target which standardized co-occurrence cannot. The same applies for standardized Jaccard (Figure S1e) and qnorm-transformed pv (Figure S1f) because of their linear mapping with standardized co-occurrence. USSG analysed their compartmented matrix data and concluded that ‘the affinity index proved to be equivalent to the standardized effect size of the traditional Jaccard metric’ (Appendix S2) by showing in their Figure 1b an almost linear relationship between the two, with all points nearly overlaying the trendline. This approximately linear relationship is observed mainly in the severely limited range of moderate strengths of association seen in this artificial matrix: 87% of the affinity values were between −1 and 1.5 in USSG's original ‘compartmented matrix’. Using our ‘ α $$ \alpha $$ -specific co-occurrence’ data, we observed that each value of affinity corresponds to a wide range of standardized co-occurrence (Figure S1c) or standardized Jaccard values (Figure S1d) when the associations are strong, making the standardized indices imprecise measures of the departure from nullity (i.e. of α ≠ 0 $$ \alpha \ne 0 $$ ). Figure S1e confirms that the standardized X $$ X $$ and J $$ J $$ indices are essentially deterministically related and effectively equivalent. Null-hypothesis standardization is a useful summary of the null distribution (here, hypergeometric) primarily when the standardized distribution is close to a standard reference distribution, namely the normal distribution, regardless of the fixed margins ( m A $$ {m}_A $$ , m B $$ {m}_B $$ , and N $$ N $$ ). However, with small sample sizes, the distribution of the standardized co-occurrence count may become very non-normal and may instead be asymmetric and have large jumps, making standardization unreliable. USSG's second comment claims that affinity exhibits a curved relationship with number of sites in their compartmented matrix, with highly significant R2 of 0.26 in quadratic regression (USSG Figure 1c). Their claim is invalid for two reasons: the quirky and unspecified nature of the ‘compartmented matrix’ co-occurrences they generated, along with the general invalidity of interpreting linear regression P-values as they do for occurrence matrix data. Using our ‘ α $$ \alpha $$ -specific co-occurrence’ data, we show that USSG's observation of affinity being sensitive to number of sites completely disappears (R2 value <0.013; quadratic regression) for each of the four α $$ \alpha $$ -specific scenarios (Figure 2b). Although we show Figure 2b for the sake of comparison with USSG, the p-values arising by doing linear or quadratic regressions like this with ecology/biogeography datasets are invalid. There are two reasons for this: (a) discreteness of the extended hypergeometric data and small samples imply that the P-values (generated from normal distributions in all standard software) are way off, and more importantly, (b) the data points for all species pairs in a measured set of species are not independent, a key assumption of standard regression software. The dependence arises because each species is reused in multiple species pairs. Of these two reasons, ecologists can probably tolerate non-normality of error distributions, although reasoning with those P-values on small data samples is sloppy science. But reason (b) implies that doing such regressions at all is misguided. See Appendix S3 for a third argument against the method of USSG's Figure 1c. USSG's second comment also criticizes our affinity model for assuming that all sites are equally suitable for each species considered separately. That is a validly expressed limitation of the model. Yet, their strong preference for continuing to use null model P-values and standardized indices begs the question, since the hypergeometric null model that they along with Griffith et al. (2016) and Veech (2013) use within the FE model also assumes this same exchangeability. Our affinity model presents improvements on the FE model and standardized co-occurrence/indices. Appendix S4 addresses USSG's comments about our R script/package that we believe were misguided. Throughout their Commentary, and particularly in their summary, USSG persistently ignore the fact that their statistical toolkit contains no model of joint species occurrence under which pairwise occurrence shows strong association, and this is the whole point of our innovation of affinity. Without conditioning on prevalence counts m A , m B $$ {m}_A,{m}_B $$ , mainstream statistics also treats essentially the same model within the class of loglinear models (Agresti, 2013): Mainali et al. (2022) presents our affinity model and supplies an ecology-relevant interpretation of it and along with Mainali and Slud (2022) shows how to use our CooccurrenceAffinity R package to achieve scientifically sensible analyses under this model. KM was supported by the Grayce B. Kerr Fund, Inc. No permits were needed to carry out this research. None. R script and data for complete analysis and plots are available at https://github.com/kpmainali/Affinity_JBiogeo_Rejoinder. Appendix S1–S4. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article. Kumar Mainali is an ecologist and statistician who develops novel mathematical/statistical indices for pressing scientific challenges, and uses the cutting edge AI technology for precision conservation. His research and predictive analytics operate at the intersection of ecology, conservation biology, biogeography, and climate change. He frequently works with species distribution models and expert maps. Eric Slud is a mathematical statistician working on problems of biostatistical survival analysis, survey sampling inference, and inference for stochastic processes, with recent forays into ecology, spatial statistics, and genomics. The connecting threads in these research interests are: formulation of statistical models, theoretical study of identifiability of parameters, and derivations of mathematical properties of estimators. Author Contributions: Kumar P. Mainali and Eric Slud conceived the ideas; Eric Slud led the development of new functions for standardized quantities. Kumar P. Mainali and Eric Slud analyzed the data and wrote the manuscript.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要