acta oecologica 33 (2008) 66–72
available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/actoec
Original article
Statistical tests for biological interactions: A comparison of permutation tests and analysis of variance Michael E. Frakera,*, Scott D. Peacorb,c a
Department of Ecology and Evolutionary Biology, University of Michigan, 830 North University, Ann Arbor, MI 48109-1048, USA Department of Fisheries and Wildlife, Michigan State University, 13 Natural Resources Building, East Lansing, MI 48824, USA c Great Lakes Environmental Research Laboratory (NOAA), 2205 Commonwealth Boulevard, Ann Arbor, MI 48105-2945, USA b
article info
abstract
Article history:
Interaction terms from statistical tests are often used to make inferences about biological
Received 7 May 2006
processes. For interaction terms to be biologically meaningful, it is critical that the statis-
Accepted 11 September 2007
tical method used tests a model that corresponds to a realistic null hypothesis. A
Published online 26 October 2007
commonly used data analysis method, the analysis of variance F-test (ANOVA), is limited when examining interactions because there are a limited number of statistical models that
Keywords:
it can test. This is further complicated by the fact that data transformations, which affect
ANOVA
the model being tested, are sometimes required to meet ANOVA’s assumptions. Thus,
Interaction
when using ANOVA, it can be difficult to determine whether interactions that are found
Non-parametric
in data were produced by biological mechanisms or are statistical artifacts due to an unre-
Permutation
alistic model. A survey of the literature indicated that these shortcomings are often not rec-
Randomization
ognized despite ANOVA’s widespread use. In this paper, we evaluate the suitability of an
Transformation
alternate method, permutation tests, compared to ANOVA. We compare the range of potential statistical models that each method can test and the power of each method to detect interactions when using an appropriate model. We provide two simulated experiments on species interactions that show that ANOVA and permutation tests have similar power when testing an appropriate statistical model, but that permutation tests provide an advantage over ANOVA in their ability to test a wider range of models. We conclude that permutation tests can be used to make inferences, potentially impossible with ANOVA, concerning biological interactions. ª 2007 Elsevier Masson SAS. All rights reserved.
1.
Introduction
In recent years, increasing attention has been given to the mechanistic basis of systems in which many factors (e.g. species, chemicals, and genes) interact to determine their structure and dynamics (e.g. in animal community ecology: Sih et al., 1998; in plant community ecology: Lortie et al., 2004; in epidemiology: Greenland, 1993; in environmental
toxicology: Relyea, 2004; and in evolutionary biology: Wade, 1992). An example of this trend is the interest in potential behaviorally mediated trophic cascades, in which a predator interacts with a consumer to affect resource density via predator-induced changes in consumer behavior (Werner and Peacor, 2003; Schmitz et al., 2004). Another example of this type of interaction is facilitative interactions in which the probabilities of certain species of plants becoming established in new
* Corresponding author. Fax: þ1 734 763 0544. E-mail address:
[email protected] (M.E. Fraker). 1146-609X/$ – see front matter ª 2007 Elsevier Masson SAS. All rights reserved. doi:10.1016/j.actao.2007.09.001
acta oecologica 33 (2008) 66–72
areas depend on the presence of other species (Richardson et al., 2000). These types of interactions are important to identify because determining the structure and dynamics of communities containing them requires understanding how the direct, pair wise interactions between two factors are affected by the presence of the other factors. Because statistical analysis is often used to identify these types of interactions, it is necessary to distinguish an interaction between two biological factors that influences their independent effects, hereafter a ‘‘biological interaction,’’ from the interaction term in standard statistical tests, hereafter a ‘‘statistical interaction.’’ To clarify this distinction, consider the combined effect of two predators on prey density in a short-term experiment (i.e. with no reproduction). Biological interactions arise when a new or altered biological process must be invoked to explain the interactive effect of the two predators on prey density relative to the predicted combined effect of each predator in isolation. Such processes could include one predator inducing changes in the foraging behavior of the second predator or the prey that affect the prey’s encounter rate or vulnerability to either predator. In contrast, statistical interactions may arise whether or not there are actual changes in the fundamental interactions between the predators and prey. Statistical tests use an underlying model (hereafter a ‘‘statistical model’’) to determine the significance of individual factors (e.g. species) and their interactions affecting the magnitude of measured responses. If the statistical model does not accurately represent the biological process being tested, a statistically significant interaction may not have or need a biological explanation (Billick and Case, 1994; Anderson et al., 2000). Wootton (1994) gives an example in the two predators – common prey scenario discussed above. If the two predators consume 60 and 70% of the prey, respectively, when alone and nearly all of the prey when together, a standard ANOVA of untransformed data would yield a significant statistical interaction between the two predators. This is because in the underlying statistical model the individual effects of the two predators combine additively. Thus 130% (i.e. 60 þ 70%) of the prey would need to be consumed by both predators when together in order for the test to indicate no interaction, which is of course not possible. This example shows how a statistical interaction could result from a scenario that does not exhibit any relevant biological interactions. Rather, the statistical interaction is due simply to the fact that animals cannot continue to eat when there is no food. There can therefore be a discord between the statistical interactions used to examine biological interactions in empirical studies and the biological processes of interest. While any nonadditivity of the main effects results in a statistical interaction, the non-additivity can be generated without changing the fundamental biology, as in the example above. Indeed, as we discuss in more detail below, a statistical interaction can be found in any data depending on the statistical model tested, whether implicitly or explicitly chosen by the investigator. A common approach to test for biological interactions is to analyze data from a factorial experimental design using a multifactorial ANOVA F-test, with the interaction terms in the ANOVA being used as the test statistics (see examples in Wootton, 1994). However, ensuring that a biologically appropriate model is tested by ANOVA can be problematic because
67
the set of potential statistical models available to test with ANOVA is restrictive, and may not be able to capture important biological features of the system. Further, ANOVA has several restrictive assumptions about the form of the data that can be tested, and meeting these assumptions can require that data be transformed, which changes the statistical model being tested. In this paper, we first present a survey of the literature to gauge how often statistical models are shown to be biologically appropriate during data analysis in ecological research (i.e. so that statistical interactions indicate biological interactions). We then simulate experiments on species interactions to analyze and compare the suitability of an alternate method, permutation tests, compared to ANOVA to examine biological interactions. We compare the range of potential statistical models available to permutation tests and ANOVA, and compare the power of each method to detect interactions when using an appropriate model.
2.
Literature survey
We surveyed how ANOVA and other methods are currently being used to test for biological interactions. The survey was intended to identify how common the potential problems associated with using statistical interactions to identify biological processes that we identified in Section 1 may be. We identified all articles published in Ecology, Journal of Animal Ecology, and Oikos during 2004 that statistically tested for biological interactions and made inferences on the biological mechanisms based on the results of those tests. For each article, we recorded which statistical method was chosen, whether or not the data were transformed, and whether or not a justification was provided for the underlying statistical model (i.e. whether the statistical model was shown to be biologically appropriate). Note that we surveyed data transformations because they determine the underlying statistical model when using ANOVA. We used two standards to describe the amount of justification. Our ‘‘strong’’ standard required that the underlying statistical model be clearly stated and that the statistical model be shown to be appropriate for the data and the biological system. This standard could be met, for example, by showing what the model would predict for the null hypothesis of no interaction and what a statistical interaction would mean biologically. Our ‘‘weak’’ standard required a statement that at least implied that the underlying model was appropriate for the data and biological system, but did not need to be explicitly shown. This standard could be met, for example, by noting that the chosen statistical test was testing a particular hypothesis (i.e. a particular statistical model) or by citing another paper that justified the chosen test. One hundred and fifty one papers were found in which at least one interaction was included in the results. In 135 of the papers, some form of ANOVA was used, and in 16, other methods were used. In the papers using ANOVA, data transformation was common. In 107 (79%) of the papers, authors transformed their data prior to analysis, with 56 (52%) of these 107 specifically mentioning using a transformation in order to meet the assumptions of ANOVA. Overall, out of the 135
68
acta oecologica 33 (2008) 66–72
papers using ANOVA, only 7% provided strong justification and only 38% provided weak justification for the statistical test used. The literature survey suggests that the problems identified in Section 1 may be common.
3.
Materials and methods
First, we compared the ability of ANOVA and permutation tests to test null hypotheses of no interaction in data sets produced by three different models (described below), each of which represents a different biological process generating the data, hereafter a ‘‘biological model.’’ For example, in the two predators–single prey example in Section 1, the biological model represents the predicted combined predation rate based solely on each predator’s predation rate when alone (i.e. independent of relevant biological interactions between the predators as discussed above). We produced data sets similar to those produced in empirical studies by basing the data sets on two " two factorial designs with ten replicates. We tested for interactions in each data set using ANOVA and permutation tests with three different statistical models. When a statistical model matches the biological model that generated the data, no statistical interactions should be detected (i.e. Type-I error rates should be low) because the statistical model is an accurate null hypothesis; the statistical model explains the data without needing to assume biological interactions are occurring. We next generated data sets in which a factor that represented a biological interaction was added to the biological model. The factor, which spanned a range of magnitudes, represents a biological interaction that causes the response variable to deviate from its expected value if the null biological model is true. For example, in the two predators–single prey example in Section 1, the interaction term would represent a process that causes the combined predation rate to deviate from that predicted by combining each predator’s predation rates when alone. We then used ANOVA and permutation tests to test for statistical interactions using the statistical model that matched the biological model in order to compare the power of ANOVA and permutation tests to detect biological interactions in each data set. We created each simulated data set by first defining mean values (m) for the control and the individual factor treatments – for the no-factor control, mC ¼ 10; for the factor A only treatment, mA ¼ 6; and for the factor B only treatment, mB ¼ 8. These means represent negative main effects that each factor has on the response variable. The treatment means when factor A and factor B are both present (mAB) were then generated by three biological models. Each biological model represents a different expected effect on the response variable when both treatments are combined and no biological interaction occurs; i.e. they represent different predictions of the expected combined effect of two factors if no additional biologically relevant interactions come into play. First, an additive model was used, in which mA ¼ mC ð1 þ aÞ mB ¼ mC ð1 þ bÞ mAB ¼ mC ð1 þ a þ bÞ
(1)
where a and b represent a proportional change in the response variable. The additive model assumes that the combined effect of two factors is equal to the sum of their independent proportional effects. For example, in our case, factor A had a proportional effect of a ¼ [(mA & mC)/mC] ¼ [(6 & 10)/10] ¼ &40% and factor B had a proportional effect of b ¼ &20%, so the additive model predicts that mAB ¼ 10(1 & 0.4 & 0.2) ¼ 4.0. Second, a multiplicative model was used (Wilbur and Fauth, 1990), in which mA ¼ mC ð1 þ aÞ mB ¼ mC ð1 þ bÞ mAB ¼ mC ð1 þ aÞð1 þ bÞ
(2)
where a and b again represent a proportional change in the response variable. If the multiplicative model explains the biological data, factor A (or B) has the same proportional effect on the outcome whether or not factor B (or A) is also acting. For example, in our case, the multiplicative model predicts that mAB ¼ 10(1 & 0.4) (1 & 0.2) ¼ 4.8. Note that each factor may be acting at the same time or at different times, but their total combined effect is equivalent to a situation in which the first factor has its independent proportional effect on the response variable, and then the second factor has its independent proportional effect on what remains of the response variable (Fowler and Rausher, 1985). A multiplicative model has been used, for example, to describe the expected combined predation rate of two predators when feeding on a fixed initial number of prey (Soluk and Collins, 1988). The third model is based on a biological model derived by Fowler and Rausher (1985), who examined the competitive effects of two plant species on the growth of the third. They introduce a model in which the combined effect of two plants (i.e. factors) maintains the ratios of their independent effects on the third plant’s growth (i.e. response variable). Fowler and Rausher assume that a plant’s growth is inversely proportional to the amount of resources used by its competitors. They also assume that the relative proportion of resources acquired by any plant species is the same in two species experiments as in three species experiments. This differs from the multiplicative model, in which the effect of a factor is a constant proportion, rather than a relative proportion that depends on the number of other species in the experiment. To represent this model, we use 1/a0 as the fraction of the mass of the focal species when grown with species A compared to when grown alone. The reduction in growth of the focal species when in competition with species A to 1/a0 of its mass when alone indicates that the resources were used by the two species in the ratio of 1/a0 :(1 & 1/a0 ). The reduction in growth of the focal species when grown with the second species is analogous, such that, with the three species together, mA ¼ mC =½1 þ ða0 & 1Þ( ¼ mC =a0 mB ¼ mC =½1 þ ðb0 & 1Þ( ¼ mC =b0 mAB ¼ mC =½1 þ ða0 & 1Þ þ ðb0 & 1Þ( ¼ mC =ða0 þ b0 & 1Þ
(3)
Using the means we defined above, species A reduced the response variable by 40% to 6, so 1/a0 ¼ 6/10 ¼ 1/1.7, and a0 ¼ 1.7. Species B reduced the response variable by 20% to 8, so 1/b0 ¼ 8/10 ¼ 1/1.25, and b0 ¼ 1.25. Therefore, Fowler and Rausher model predicts that the effect of both species combined will be to reduce the response variable to mAB ¼ 10/ (1.7 þ 1.25 & 1) ¼ 5.1.
acta oecologica 33 (2008) 66–72
A random number generator was used to produce 10 replicate data points around the four treatment means of the control treatment (mC ¼ 10), the factor A treatment (mA ¼ 6), the factor B treatment (mB ¼ 8), and the combined AB treatment for each biological model (Eqs. (1)–(3)). Data points were generated from a potential set of normally distributed values, all with variance ¼ 1 around each treatment m. This was done 100 times for each biological model, which produced 100 replicate data sets, each containing 40 data points. These data sets represent the data that would be produced by two " two factorial designs if each biological model was true. Each replicate of each data set was tested for A " B interactions using ANOVA and permutation tests. We first compared how well each method tested the null hypothesis of no interaction in each type of data. Finding a significant interaction (i.e. Type-I error, Anderson and Legendre, 1999) indicated that the statistical method was inappropriately chosen to yield insight into biological interactions because it did not test an appropriate underlying statistical model. To estimate the Type-I error for each method, we calculated the proportion of the time that each method identified a statistical interaction in each data set. We used three statistical models with ANOVA F-tests. First, we tested untransformed data. This procedure implicitly uses an additive statistical model (analogous to Eq. (1)). Second, we log transformed the data (i.e. the means of the data points in each treatment) in order to use a multiplicative statistical model (analogous to Eq. (2)). Third, we tested the data after using a square root transformation, which is used occasionally to meet ANOVA’s assumptions. This transformation does not correspond to a known, or easily interpretable, biological model. Note that the only known statistical models available to test with ANOVA are the additive and multiplicative, so no statistical model was available with ANOVA to represent Fowler and Rausher model. All ANOVA tests were conducted using SPSS 12.0 (SPSS, Inc.). We also tested the data using three statistical models in permutation tests – an additive model, a multiplicative model, and Fowler and Rausher-based model described above. We used the biological models above (Eqs. (1)–(3)) for the additive, multiplicative, and Fowler and Rausher-based statistical models. We tested these statistical models by comparing how well the untransformed data in each replicate data set fit the values predicted by each model. If the statistical model fits the data, the residuals in each treatment should be exchangeable without changing the value of a test statistic calculated from those data (Fisher, 1935; ter Braak, 1992). Significance is inferred by comparing the value of the test statistic of the original data with the range of values of the test statistics computed from every permutation of the data (Anderson and ter Braak, 2003). Permutation tests do require that treatments were randomly allocated at the beginning of the experiment and that errors are homogeneous among treatments (Anderson, 2001a). Both conditions were met in our data sets. To perform the permutation tests, first, 10,000 replicate data sets were created by randomly re-assigning the residuals from the original data sets of each biological model among the different treatments (i.e. the residuals from the original data set were pooled and then assigned to a new treatment at
69
random). We then calculated the group means of the control and individual factor treatments in each replicate data set. These group means were used to calculate the expected means of the combined AB treatments by entering the group means into Eqs. (1)–(3). We then calculated the residuals within the control and each individual factor treatment (i.e. the residual between each data point and its group mean) and between the actual data points in the combined AB treatment and the expected means from the statistical models for each replicate data set. We calculated the test statistic by comparing the F-value for the A " B interaction term of the original data set with the potential range of F-values estimated from the F-values of the replicate data sets. To calculate the F-value for the A " B interaction term in the original data set, we summed the squares of the deviations of the replicate data points in the combined AB treatment from the expected value from the statistical model (i.e. squaring the sum of 10 values) and divided by the sum of the squared within-group deviations of each replicate data point from its corresponding group mean (i.e. squaring the sum of 40 values). We then calculated F-values for each replicate data set using the same procedure. The P-value is the probability that any permutation of the actual data produces an F-ratio more extreme than that of the actual data (Anderson and ter Braak, 2003). For example, if the F-ratio of the actual data is greater 9500 of 10,000 times, the P-value would be 0.05. The permutation tests were performed using Matlab 7 (The Mathworks, Inc.). We compared the power of ANOVA and permutation tests to detect biological interactions by adding a factor representing a biological interaction (i) to each replicate of the combined AB treatment in data sets generated by both biological models (i.e. the mAB replicates were changed to mAB þ i). Interaction strengths (i) between 0.0 and 1.0 at intervals of 0.1were used. These interaction strengths simulate positive biological interactions between factor A and factor B that range from 0.0 to 10.0% of the value of the control mean. Power was defined as the proportion of rejections of the null hypothesis (Anderson and Legendre, 1999). The power of ANOVA and the permutation test using multiplicative statistical models to detect biological interactions in the data generated by the multiplicative biological model and using additive statistical models in the data generated by the additive statistical models were calculated. The power of the permutation test using Fowler and Rausher-based statistical model to detect biological interactions in Fowler and Rausher-based data set was also calculated.
4.
Results
Both ANOVA and permutation tests had similarly low Type-I error rates (i.e. the method erroneously identified interactions rarely) when the statistical model matched the biological model (e.g. with a multiplicative statistical model testing the multiplicative biological model; Table 1) in data sets generated by either biological model with no interaction factor. Both had high Type-I error rates when their underlying statistical model did not match the biological model (e.g. with an additive statistical model testing data sets generated by the
70
acta oecologica 33 (2008) 66–72
Table 1 – The Type-I error (95% CI) of ANOVA and permutation tests testing for an A 3 B interaction in the additive, multiplicative, and Fowler and Rausher-based data sets ANOVA
Additive data Multiplicative data Fowler and Rausher data
Additive model
Multiplicative model
Square root transformation
0.05 (0.02–0.12) 0.99 (.94–1.00) 0.99 (0.94–1.00)
0.98 (0.93–1.00) 0.05 (0.02–0.12) 0.92 (0.84–0.97)
0.97 (0.92–1.00) 0.93 (0.87–0.97) 0.94 (0.88–0.99)
Permutation test Additive model Additive data Multiplicative data Fowler and Rausher data
0.06 (0.02–0.13) 0.97 (0.92–1.00) 1.00 (0.97–1.00)
Multiplicative model 0.99 (0.94–1.00) 0.05 (0.01–0.12) 0.86 (0.78–0.989)
Fowler and Rausher model 0.97 (0.92–1.00) 0.92 (0.84–0.97) 0.06 (0.02–0.13)
The Type-I error rates measure how often each method erroneously detected an interaction (i.e. whether the statistical model incorrectly described the relationship between the factors when they were combined). ANOVA tested an additive model, a multiplicative model (by log transformation of the data) and an unknown model (by square root transformation of the data). The permutation tests used an additive, multiplicative, and Fowler and Rausher-based statistical models.
multiplicative biological model; Table 1). ANOVA had high Type-I error rates under all of the underlying models tested when testing the data sets generated by Fowler and Rausher-based model. Only the permutation test using Fowler and Rausher-based statistical model had a low Type-I error rate (Table 1). ANOVA and permutation tests had similar power at all interaction strengths when testing the additive (Fig. 1) and multiplicative (Fig. 2) data sets that included the biological interaction factor (i). Permutation tests also had similar power to detect interactions in Fowler and Rausher-based data set (poweri¼0.60 [95% CI] ¼ 0.89 [0.81–0.94]) as in the multiplicative data set and slightly higher power than in the additive data set.
5.
Discussion
The results of our simulated experiments indicate that using an appropriate statistical model is crucial with both ANOVA and permutation tests to detect biological interactions. The ability of permutation tests to use a wider variety of statistical
Fig. 1 – The power (95% CI) of ANOVA and permutation tests using additive statistical models to test for A 3 B interactions in additive data sets over a range of interaction strengths (i).
models should enable this method to have wider applicability, although we cannot strongly generalize from testing only our two simulated data sets. For example, while both methods had low Type-I error rates and similar power to detect biological interactions when a multiplicative statistical model was appropriate, only permutation tests were able to test an appropriate statistical model for Fowler and Rausher-based data (1985). When the statistical model tested by either method was different from the biological model that generated the data, both methods incorrectly detected biological interactions. Our literature survey found that a large proportion of published papers do not explicitly consider what hypothesis a chosen statistical method tests about the relationship between biological factors. Justification for the statistical method used in terms of what the statistical model assumes about the biological system and what statistical interactions would imply is often not being provided. Also, ecologists routinely employ statistical methods such as ANOVA without recognizing their potential shortcomings. For example, data
Fig. 2 – The power (95% CI) of ANOVA and permutation tests using multiplicative statistical models to test for A 3 B interactions in multiplicative data sets over a range of interaction strengths (i).
acta oecologica 33 (2008) 66–72
transformations are often used in order to meet ANOVA’s assumptions, although they change the hypothesis being tested. These results suggest that authors are concerned (correctly) about meeting the assumptions of ANOVA regarding the distribution of the data in order to have a reliable test, but many are less concerned or may not recognize the relationship between the statistical model being tested and the biological interaction being examined. Our survey updates and complements a similar survey by Wootton (1994). Wootton found that nine of 18 published studies of predator–prey interactions used additive models despite the fact that additive models provided a better fit than multiplicative models in only 10% of the total tests in the papers surveyed and predicted impossible values in 56% of the tests. We show that this is a continuing problem and, more generally, that the statistical models underlying tests are not being connected to specific biological hypotheses. The limited number of statistical models available to test using ANOVA is a result of necessary simplifications to the calculations used by the method. ANOVA was developed to provide a statistical method for comparing the effects of individual treatments and their combinations on a response variable in data gathered from factorial experiments long before computing power became widely available (Fisher, 1935). To simplify the required calculations, ANOVA assumes that data are all samples of normally distributed populations having equal variances. This allows the ratio of the betweentreatment sum of squares and the within-treatment sum of squares to follow a tabulated normal theory F distribution, which can easily be consulted, rather than needing to calculate the exact F distribution of the data through randomization (Fisher, 1935; Box et al., 1978; Anderson, 2001b). Simplifying the test in this way not only greatly reduces the number of calculations but also necessarily requires an additive relationship between factors to be assumed, which limits the range of data types and possible hypotheses that can be analyzed. Although data can be transformed in order to test a different statistical model, the only available transformation is the log transformation, which corresponds to a multiplicative model. Moreover, data transformations can make analyses less biologically insightful by obscuring or changing the metric or scale of biological interest (Day and Quinn, 1989; Stanton and Thiede, 2005). Permutation tests may increase the ability of ecologists to evaluate biological interactions (see Section 1). One approach to test for such interactions statistically is to construct an appropriate null model of a system in the absence of such interactions. For example, a null model that represents the expected rate of prey reduction in the presence of two predators if there are no biological interactions between predators or predator and prey could assume that each predator consumes the same proportion of prey when together as when alone (i.e. the multiplicative model). Or, for example, if two allelopathic plants compete with a third, a null model for the growth rate of the third in the absence of allelopathy might be similar to the model proposed by Fowler and Rausher. Because ANOVA can only use two underlying statistical models (additive and multiplicative) as null models, it may not be appropriate to examine interactions in some circumstances using ANOVA. This limitation is highlighted by the
71
example of plant competition of Fowler and Rausher (1985), in which the combined effect of two plants on a third is not expected to follow null models available to ANOVA. We have shown that permutation tests can be used effectively with models not available to ANOVA (e.g. testing Fowler and Rausher biological model). The more adaptable model testing of permutation tests should allow researchers to test more specific hypotheses than are possible with ANOVA. Selecting a biologically appropriate model for statistical tests is important not only to avoid identifying statistical interactions in data that do not correspond to biological interactions, but also to increase the usefulness of the statistical test. For example, identifying an interaction as being subadditive does not provide as much information as finding a negative interaction under a more specific statistical model (i.e. falsifying a more specific hypothesis). However, note that while permutation tests are more adaptable for model testing, not all data are amenable to analysis using this method. For example, when data are heterogeneously distributed, as would occur if the variance of the errors was correlated with the size of the means, permutation tests will not perform well (Anderson, 2001a; but see Manly and Francis, 1999).
6.
Conclusions
With increasing interest in biological interactions, the statistical methods available in this area of research need to be more widely discussed in order to determine which methods provide the best performance. At a minimum, any statistical method used must be able to test a statistical model that is based on a reasonable, biological, null hypothesis. While permutation tests can help to avoid errors due to biologically-unrealistic models, permutation tests do have some requirements, and an appropriate hypothesis and model need to be determined for each study. This should be considered when reporting the results of any statistical test for biological interactions. A statistical test should be used to help guide interpretation, and the benefits and limitations of any method should be known and taken into account.
Acknowledgements We thank Earl Werner, Tim Wootton, and several anonymous reviewers for helpful comments on an earlier draft of this manuscript. The comments of one anonymous reviewer particularly helped to improve this manuscript. This work received support from the Michigan Agricultural Experimental Station to Scott Peacor. This is GLERL contribution number 1446.
references
Anderson, D.R., Burnham, K.P., Thompson, W.L., 2000. Null hypothesis testing: problems, prevalence, and an alternative. J. Wildl. Manag. 64, 912–923.
72
acta oecologica 33 (2008) 66–72
Anderson, M.J., 2001a. Permutation tests for univariate or multivariate analysis of variance and regression. Can. J. Fish. Aquat. Sci. 58, 626–639. Anderson, M.J., 2001b. A new method for non-parametric multivariate analysis of variance. Austral Ecol. 26, 32–46. Anderson, M.J., Legendre, P., 1999. An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. J. Stat. Comput. Simul. 62, 271–303. Anderson, M.J., ter Braak, C.T.F., 2003. Permutation tests for multifactorial analysis of variance. J. Stat. Comput. Simul. 73, 85–113. Billick, I., Case, T.J., 1994. Higher order interactions in ecological communities: what are they and how can they be detected? Ecology 75, 1529–1543. Box, G.E.P., Hunter, W.G., Hunter, J.S., 1978. Statistics for Experimenters: an Introduction to Design, Data Analysis, and Model Building. John Wiley and Sons, New York. ter Braak, C.J.F., 1992. Permutation versus bootstrap significance tests in multiple regression and ANOVA. In: Jockel, K., Rother, G., Sendler, W. (Eds.), Bootstrapping and Related Techniques. Springer-Verlag, New York, pp. 79–86. Day, R.W., Quinn, G.P., 1989. Comparisons of treatments after an analysis of variance in ecology. Ecol. Monogr. 59, 433–463. Fisher, R.A., 1935. Design of Experiments. Oliver and Boyd, Edinburgh. Fowler, N.L., Rausher, M.D., 1985. Joint effects of competitors and herbivores on growth and reproduction in Aristolochia reticulate. Ecology 66, 1580–1587. Greenland, S., 1993. Basic problems in interaction assessment. Environ. Health Perspect. 101 (Suppl. 4), 59–66. Lortie, C.J., Brooker, R.W., Choler, P., Kikvidze, C., Michalet, R., Pugnaire, F.I., Callaway, R.M., 2004. Rethinking plant community theory. Oikos 107, 433–438.
Manly, B.F.J., Francis, R.I.C., 1999. Analysis of variance by randomization when variances are unequal. Aust. N. Z. J. Stat. 41, 411–429. Relyea, R.A., 2004. The growth and survival of five amphibian species exposed to combinations of pesticides. Environ. Toxicol. Chem. 23, 1737–1742. Richardson, D.M., Allsopp, N., D’Antonio, C.M., Milton, S.J., Rejmanek, M., 2000. Plant invasions: the role of mutualisms. Biol. Rev. 75, 65–93. Schmitz, O.J., Krivan, V., Ovadia, O., 2004. Trophic cascades: the primacy of trait-mediated indirect interactions. Ecol. Lett. 7, 153–163. Sih, A., Englund, G., Wooster, D., 1998. Emergent impacts of multiple predators on prey. TREE 13, 350–355. Soluk, D.A., Collins, N.C., 1988. Synergistic interactions between fish and stoneflies: facilitation and interference among stream predators. Oikos 52, 94–100. Stanton, M.L., Thiede, D.A., 2005. Statistical convenience vs biological insight: consequences of data transformation for the analysis of fitness variation in heterogeneous landscapes. New Phytol. 166, 319–338. Wade, M.J., 1992. Sewall Wright: gene interaction in the shifting balance theory. In: Antonovics, J., Futuyma, D. (Eds.), Oxford Surveys of Evolutionary Biology, vol. VI. Oxford University Press, Oxford, pp. 35–62. Werner, E.E., Peacor, S.D., 2003. A review of trait-mediated indirect interactions in ecological communities. Ecology 84, 1083–1100. Wilbur, H.M., Fauth, J.E., 1990. Experimental aquatic food webs: interactions between two predators and two prey. Am. Nat. 135, 176–204. Wootton, J.T., 1994. Putting the pieces together: testing the independence of interactions among organisms. Ecology 75, 1544–1551.