Response Feature Analysis for Repeated Measures in Ecological Research

Bulletin of The Ecological Society of America(2021)

引用 1|浏览4
暂无评分
摘要
Response feature analysis (RFA) involves analyzing a data set, in this case repeated measures data, based on a summary function of the data rather than the data themselves. There are two steps to a typical RFA. The first is a creative summary of the data, done separately on each experimental or sampling unit, that produces a single summary measure addressing the question of scientific interest. The second involves choosing an appropriate statistical test for analysis of the reduced data. Despite its long history (a paired t test is a simple example of RFA), this powerful and sophisticated approach has never really received the attention it deserves in ecological and natural resource management. The functional benefits of RFA (simplicity, intuitive sensibility, and expansion of the types of questions one can address) argue for its increased use. The questions of interest for data interpretation and decision making in both basic and applied biological studies are often simple: “How many animals are in a given habitat?” “Does abundance change over time?” “Does abundance differ between samples A and B?” In such studies, we advocate focusing on simple functional attributes of the data that better describe the question of interest than the data values themselves. Indeed, we view RFA as a structured response to the plea for simplicity in data analysis expressed by Murtaugh (2007). To paraphrase, just because it is possible to conduct a general linear mixed model (GLMM) on a data set does not mean that one should. Quite frequently, scientific progress comes from a simplification that clarifies a response or difference, not from increasing statistical complexity. Conceptually, RFA is a two-stage procedure: First, one computes a summary function of the data from each sampling unit; and second, one then performs an appropriate statistical analysis on the fitted responses. A paired t test is an example of this approach, demonstrating that the conceptual basis of RFA has a long history. For example, Snedecor and Cochran (1989:71) present an 18th century agricultural data set that was properly analyzed using RFA (i.e., a paired t test). For a paired t test, we (1) calculate the difference within each pair of observations on the same entity; then (2), proceed with a one-sample t analysis on the differences (i.e., summary function) of those observations. A paired t analysis is perhaps the simplest use of RFA. As an example of how a creative summary step may enable one to discover new and relevant questions, a researcher might also calculate ratios of the data within pairs; the subsequent one-sample analysis would then quantify relative effects of the treatment instead of mere arithmetic effects. An additional example is provided by Allen et al. (1983), who studied the effect of four different diets on weight gain in calves and demonstrated how RFA may be used with repeated measures data followed by an ANOVA. Allen et al. (1983) fitted a linear regression to the weight gains of each calf across time. The slopes from the fitted regressions (i.e., the chosen response summary) were compared using a two-sample t test of the difference in average slopes between any two specified treatment groups, based on an analysis suggested by Wishart (1938). To be sure, a repeated measures ANOVA on these data would lead to the same conclusions, but by using RFA the answer can be found using only simple linear regression followed by a two-sample t test (or a one-way ANOVA on all four treatments), which obviates the need to model the covariance structure of the data. Finally, RFA allows for a more thorough inspection of the data structure, increasing transparency and facilitating identification of potential problems. Oddly, the authors did not identify the diets beyond numbering them one through four. We suspect that that was because their interest was in exemplifying a statistical method rather than their paper being an animal science study per se. That said, they had eight animals in each treatment group, and the three questions of interest were whether diets one and two differed from three and four, whether diet one differed from two, and three from four. Their t tests for the latter two questions (each with 15 degrees of freedom) were insignificant (P = 0.34 and 0.70, respectively), but diets one and two did differ significantly from diets three and four (t test with 31 degrees of freedom, P = 0.0075). The term RFA was coined by Everitt (1995, 2002:91), and Everitt and Pickles (2000) (these references are not well cited in the ecological or conservation literature). The authors describe a “response feature analysis—the use of summary measures” for repeated measures data and list several potential summary measures including the: (1) overall mean, (2) maximum or minimum value of a sample, (3) slope of a regression (of the variable of interest on time), and (4) time to reach a particular value (chosen for biological meaning). A typical repeated-measures ANOVA could not address (2) or (4). As regards (2), ANOVA works on means, not minima or maxima, and (4) requires an initial analysis to arrive at a time estimate for each subject. Everitt (2002:Chapter 6) provides an example demonstrating the simple elegance of the RFA approach via a comparison of both RFA and a random effects linear model, be that conducted as either an ANOVA or GLMM. The two analyses resulted in essentially identical conclusions, but the RFA approach was much simpler to both implement and understand. Other investigators have advocated an RFA approach, including Ramsey and Schafer (2002), who used the term “summary measures.” In both cases, these authors mention the possibility of an RFA, but then go on to emphasize classical repeated measures. Two of us are fish ecologists, and so it is natural for us to develop several examples from our own data that are generalizable to other fields. We briefly discuss three fisheries examples that highlight the importance of choosing summary characteristics that most directly and efficiently test the hypothesis of interest. Depending on the question being asked, one may create summaries using all or a subset of the data. Taking time to choose the most appropriate summary measure also increases the probability that a statistical analysis will actually address the question of interest and lead to better science and management. Bozeman and Grossman (2019) conducted feeding trials on a drift-feeding stream fish (Arctic Grayling, Thymallus arcticus) to test the hypothesis that stream velocity affected prey capture success (a complete description of the experimental design is provided by Bozeman and Grossman 2019). In short, individual fish were subjected to seven increasing stream velocities in an experimental stream flume. Specimens were fed nine prey items at each treatment velocity, and the number of prey captured and missed at each velocity was recorded (Fig. 1). We plotted the proportion captured by each fish at each velocity, as shown in Fig. 2A (for a subset of the fish). We then fitted a logistic regression separately to each fish, which yielded fitted capture probabilities as a function of velocity (Fig. 2B). This was a prospective study (as opposed to retrospective) because capture probabilities were an observed property of the data; that is, they were random and not pre-chosen. In such a case, it is legitimate to report and discuss probabilities. To address our original hypothesis (that velocity affected prey capture success), we decided that the best summary measure would be the odds ratio (a direct measure of how the odds of prey capture change with increased velocity) based on the individual logistic regressions. From our sample of 15 fish, the calculated mean odds ratio is 0.82 (standard error [SE] = 0.014, 95% confidence interval [CI] = 0.78–0.85). It is easier to interpret this ratio by reporting the size of the decline in the odds and switching to percentages: In short, the odds of prey capture drop by 18% (SE = 1.4%, 95% CI = 15–21) for each 10 cm/second increase in treatment velocity. The second example comprises a more traditional repeated measures data set involving permanent study sites repeatedly sampled over time. The data come from Alcova Reservoir in Wyoming and are part of a larger study quantifying changes in abundance of Rainbow Trout (Oncorhynchus mykiss). The data indicated that Rainbow Trout abundance was increasing from 1994 to 1998 (Fig. 3A), and they display a typical pattern suggesting that the change in abundance per year is relative (i.e., a percentage) rather than a constant increment (e.g., eight fish every year regardless of abundance). The overall curvature in the data suggests that the increase is exponential; moreover, as is common with biological data, the variance increases with mean abundance, suggesting that variation also is proportional. As such, the data provide an opportunity for RFA: regression across time, with the response log-transformed (Fig. 3B). A simple linear regression, using log-transformed abundances as the response variable, was an adequate fit to the data. The analyses are summarized in Table 1. The mean of the back-transformed slopes is 1.15 (95% CI = 1.08–1.22). That is, we estimate a 15% yearly increase and are 95% confident that it is between 8% and 22%. Given the low sample size, one might be skeptical of the validity of using the t distribution, since arguing for at least approximate normality for the distribution of the means would be dubious at best. Accordingly, we generated a bootstrap CI (with 10,000 bootstrap replications), yielding the interval 1.09–1.21 (i.e., we are 95% sure the true increase is between 9% and 21%). Qualitatively, this CI matches the normality-based one, but does not depend on any assumptions about the distribution of the mean. Our third example uses the data from example 1, but illustrates the ability of RFA to address questions that are beyond the scope of traditional repeated measures methods. The question we address is at what velocity does the probability of capturing food items drop below 50%? We note first that a logistic regression transforms to a linear regression (with predictable heteroscedasticity due to the nature of variability in binomial data), with the response being L = logit(Prob) = ln((P/1 − P)) and velocity as the predictor. Because we want to predict velocity, it behooves us to “invert” the regressions for each fish using L as the predictor and velocity as the response. This inversion is important because once one has regressed Y on X, and obtained a line in the form of Y = b0 + b1X, it is sorely tempting to use this equation to find X, given Y: X = (Y – b0)/b1. Unfortunately, this will likely yield biased estimates of the average of X, given Y. The solid line in Fig. 4 is the regression of Y on X. Given that the purpose of a regression line is to estimate the average of the response given a value of the predictor, the solid line in Fig. 1 more or less splits the data evenly (above and below the line) for given values of X. That same line does a highly biased job, though, of predicting the average of X, given Y. Using that line, for instance, given Y = 4, one would estimate X to be approximately 2. The dashed line in Fig. 4 splits the data horizontally and is in fact the regression line of X on Y in this figure. We can see immediately that for Y = 4, the average of X is approximately 4.5. Accordingly, we ran a regression of velocity on the logit L for each fish; since L = 0 coincides with probability of 0.5 (our predictor value of interest), the intercept from those equations served as the estimates of the velocity at which probability of capture hit 50%. The mean velocity at which probability of capturing prey dropped below 50% was 44.4 cm/second; SE = 1.02. We have shown how RFA may provide additional clarity and insight into common ecological questions and that the ease and simplicity of RFA argues for its increased use. Such insights are valuable and obtainable without advanced statistical analyses. In our worked examples, the RFA principle led to analyses that were simpler, not more complex, than choices that are more standard. In addition, RFA enabled us to identify additional patterns in the data that were not visible using traditional analyses. The simplicity inherent in RFA may also expose critical assumptions inherent in more traditional analyses. For example, Murtaugh (2007) examined the effect of predator removal on body size of invertebrates in six ponds, three of which had predators removed. The original analysis used a nested-plot, random coefficients ANOVA. His subsequent analysis was simplified to a two-sample t that used a summary measure of the average sizes in each pond. The complexities in the first analysis obscured the importance of assuming normality for the variation in body size among ponds (important because of his small sample sizes). That element became more visible through the RFA approach of summarize (he averaged all the invertebrate sizes in each pond) then analyze. Complex statistical approaches have their place; however, complexity in and of itself is not a virtue in research. RFA as described above is a simple and understandable approach that will be useful for a variety of common research questions in fisheries and other disciplines. Sawyer et al. (2006) measured the relative frequency of use of a large sample of habitats by radio-collared mule deer (immensely reducing the dimensionality of the data into meaningful summaries: one measure per animal, much like our “one odds ratio per fish” example), then used those relative frequencies as the response variable in a multiple regression to study habitat preferences. Clapp et al. (2014) possessed massive amounts of movement data on big horn sheep reintroduced into wildlands. They were interested in estimating “time until settling.” For each animal, they measured the SD in animal movements on a weekly basis, and when the SD fell below a predetermined threshold, the animal was deemed to have settled and the time duly recorded. That reduced the data to one measured value per animal and greatly facilitated subsequent analyses, comparing, for instance, settling times for ewes to that for rams. Superficially, this analysis looks similar to “time till event” modeling, but the latter methodology is only useful at quantifying when a discrete event occurs (e.g., time until death, or time until the next epileptic seizure). Similarly, in our example 3, where the variables of interest are proportion of prey items captured and stream velocity, we inverted the original roles (response, predictor) in order to estimate the velocity at which prey capture probability drops below 50%. A logistic regression setting where the original predictor is time could be similarly inverted (e.g., how much time passes until 90% recovery from some trauma), but this would still not be a classic “time until event” analysis. Albert Einstein is frequently quoted as having said, “Make everything as simple as possible, but not simpler,” and we believe that RFA will aid scientists in achieving this maxim. We would like to thank the many individuals who aided in various aspects of this research, including: A. Grossman, B. Grossman, and T. Simon. BBB would also like to acknowledge the many years of encouragement and support from D. Bozeman. The North Pacific Research Board (grant #1424), the Graduate School and Daniel B. Warnell School of Forestry and Natural Resources of the University of Georgia provided financial and material support for this project, and USDA McIntire-Stennis grant GEOZ-00196-Grossman.
更多
查看译文
关键词
repeated measures,ecological research,analysis,response
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要