Chapter 14: Analysis of Variance Analysis of variance
Determines whether differences exist between population means Procedure works by analyzing the sample variance Population means = treatment means
14.1: One-Way Analysis of Variance
Procedure that tests to determine whether differences exist between two or more population means Technique analyzes the variance of the data to determine whether we can infer that the population means differ Experimental design is a determinant in identifying the proper method to use
One-way analysis of variance
Procedure to apply when the samples are independently drawn Figure 14.1: Sampling Scheme for Independent Samples page 526
Confirm that the data are interval and that the problem objective is to compare populations Parameters are the four* population means: and Null hypothesis will state that there are no differences between the population means o Analysis of variance determines whether there is enough statistical evidence to show that the null hypothesis is false Alternative hypothesis will always specify the following: o Next step is to determine the test statistic
Response variable
The variable X is called the response variable
Responses
The values of the response variable X
Experimental unit
The unit we measure is called the experimental unit
Factor
The criterion by which we classify the population Each population is called a factor level
Test Statistic
If null hypothesis is true, the population means would all be equal and we would then expect the sample means to be close to one another If the alternative hypothesis is true, there would be large differences between some of the sample means
Between-treatments variation
Measures the proximity of the sample means to each other Denoted SST (sum of squares for treatments)
Sum of squares for treatments (SST)
If the sample means are close to each other, all of the sample means would be close to the grand mean; as a result SST would be small SST achieves its smallest value(zero) when all the sample means are equal A small value of SST supports the null hypothesis
Steps
Compute the sample means ( Compute the sample sizes ( Then calculate SST
If large differences exist between the sample means, making SST a large value, it is reasonable to reject the null hypothesis Important question becomes: how large does the statistic have to be for us to justify rejecting the null hypothesis?
and the grand mean (X double bar) and the total of the sample sizes to get n
To answer question, important to know how much variation exists in response variables
Within-treatments variation
Provides a measure of the amount of variation in the response variable that is not caused by the treatments Denoted by SSE (sum of squares for error)
Sum of squares for error (SSE)
Each of the k components of SSE is a measure of the variability of that sample If we divide each component by , we obtain the sample variances. We can express this be rewriting SSE as:
SSE is the combined or pooled variation of the k samples A condition for SSE requires that the population variances be equal o
Steps
Calculate the sample variances ( Calculate SSE
Next step is to compute quantities called the mean squares
Sampling Distribution of the Test Statistic
Test statistic is F-distributed with k-1 and n-k degrees of freedom provided that the response variable is normally distributed
Steps
Calculate degrees of freedom Calculate MSE and MST Calculate test statistic F
Rejection Region and p-Value
Purpose of calculating the F-statistic is to determine whether the value of SST is large enough to reject the null hypothesis If SST is large, F will be large We reject the null hypothesis only if: o Results of the analysis of variance are usually reported in an ANOVA table Table 14.2: ANOVA Table for the One-Way Analysis of Variance page 532
If SST explains a significant portion of the total variation, we conclude that the population means differ
Completely randomized design
When the data are obtained through a controlled experiment in the one-way analysis of variance, we call the experimental design the completely randomized design of the analysis of variance
Checking the Required Conditions
The F-test of the analysis of variance requires that the random variable be normally distributed with equal variances Normality requirement is easily checked graphically by producing the histograms for each sample The equality of variances is examined by printing the sample standard deviations or variances The similarity of sample variances allows us to assume that the population variances are equal
Violation of the Required Conditions
If the data are not normally distributed, we can replace the one-way analysis of variance with its nonparametric counterpart, which is the Kruskal-Wallis Test If the population variances are unequal, we can use several methods to correct the problem
Can We Use the t-Test or the Difference between Two Means Instead of the Analysis of Variance?
Analysis of variance test determines whether there is evidence of differences between two or more population means The t-test of determines whether there is evidence of a difference between two population means The question arises, can we use t-tests instead of the analysis of variance? In other words, instead of testing all the means in one test as in the analysis of variance, why not test each pair of means? There are two reasons why we don’t use multiple t-tests instead of one F-test: o We would have to perform many more calculations o Conducting multiple tests increases the probability of making Type I errors When we want to compare more than two populations of interval data, we use the analysis of variance
Can We Use the Analysis of Variance Instead of the t-Test of the Difference between Two Means?
If we want to determine whether is greater than we cannot use the analysis of variance because this technique allows us to test for a difference only If we want to test to determine whether one population mean exceed the other, we must use the t-test
Moreover, the analysis of variance requires that the population variance are equal and if they are not, we must use the unequal variances test statistic
Relationship between the F-Statistic and the t-Statistic
If we square the quantity of the t-statistic, the result is the F-statistic o
Developing an Understanding of Statistical Concepts
The F-test of the independent samples’ single-factor analysis of variance is an extension of the ttest of If we simply want to determine whether a difference between two means exists, we can use the analysis of variance Advantage of using the analysis of variance is that we can partition the total sum of squares, which enables us to measure how much variation is attributable to differences between populations and how much variation is attributable to differences within populations
14.3: Analysis of Variance Experimental Designs
Experimental design has been one of the factors that determines which technique we use One-way analysis of variance is only one of many different experimental designs of the analysis of variance For each type of experiment, we can describe the behaviour of the response variable using a mathematical expression or model
Single-Factor and Multifactor Experimental Designs
The criterion by which we identify populations is called a factor Experiment described in section 14.1 is a single-factor analysis of variance because it addresses the problem of comparing two or more populations defined on the basis of only one factor
Multifactor experiment
One in which two or more factors define the treatments
Independent Samples and Blocks Randomized block design
When the problem objective is to compare more than two populations, the experimental design that is the counterpart of the matched pairs experiment is called the randomized block design Term block refers to a matched group of observations from each sample The experimental design should reduce the variation in each treatment to make it easier to detect differences Can also perform a blocked experiment by using the same subject (person, plant, and store) for each treatment – called repeated measures design Treat repeated measures designs as randomized block designs Randomized block experiment is also called two-way analysis of variance
Fixed and Random Effects Fixed-effects analysis of variance
If our analysis includes all possible levels of a factor
Random-effects analysis of variance
If the levels included in the study represent a random sample of all the levels that exist In some experimental designs, there are no differences in calculations of the test statistic between fixed and random effects
14.4: Randomized Block (Two-Way) Analysis of Variance
The purpose of designing a randomized block experiment is to reduce the within-treatments variation to more easily detect differences between the treatment means In the one-way analysis of variance, we partitioned the total variation into the betweentreatments and within-treatments variation; that is, o SS(Total)=SST + SSE In the randomized block design of the analysis of variance, we partition the total variation into three sources of variation, o SS(Total)=SST+SSB+SSE
Sum of squares for blocks (SSB)
Measures the variation between the blocks When the variation associated with the blocks is removed, SSE is reduced, making it easier to determine whether differences exist between the treatment means The calculations for this experimental design will be calculated using Excel only To help you understand the formulas, we will use the following notation:
Table 14.4: Notation for the Randomized Block Analysis of Variance
The definitions of SS(Total) and SST in the randomized block design are identical to those in the independent samples design SSE in the independent samples design is equal to the sum of SSB and SSE in the randomized block design
Test is conducted by determining the mean squares, which are computed by dividing the sums of squares by their respective degrees of freedom
An interesting, and sometimes useful, by-product of the test of the treatment means is that we can also test to determine whether the block means differ Will allow us to determine whether the experiment should have been conducted as a randomized block design Test of the block means is almost identical to that of the treatment means except the test statistic is: o
F-distributed with and degrees of freedom Table 14.5: ANOVA Table for the Randomized Block Analysis of Variance
Checking the Required Conditions
The F-test of the randomized block design of the analysis of variance has the same requirements as the independent samples design
The random variable must be normally distributed and the population variances must be equal Equality of variances requirement also has to be met
Violation of the Required Conditions
When the response is not normally distributed, we can replace the randomized block analysis of variance with the Friedman test
Criteria for Blocking
Purpose of blocking is to reduce the variation caused by differences between the experimental units By grouping the experimental units into homogenous blocks with respect to the response variable, the statistics practitioner increases the chances of detecting actual differences between the treatment means
Developing an Understanding of Statistical Concepts
Randomized block experiment is an extension of the matched pairs experiment In the randomized block experiment of the analysis of variance, we actually measure the variation between the blocks by computing SSB The sum of squares for error is reduced by SSB, making it easier to detect differences between the treatments
14.5: Two-Factor Analysis of Variance Factorial experiment
Can examine the effect on the response variable of two or more factors Can use analysis of variance to determine whether the levels of each factor are different from one another Will use only Excel to calculate
Complete factorial experiment
An experiment in which the data for all possible combinations of the levels of the factors are gathered Will refer to one of the factors as factor A (arbitrarily chosen)
Number of levels of this factor will be denoted by a The other factor is called factor B and its number of levels is denoted by b
Replicate
Number of observations for each combination Number of replicates is denoted by r
Balanced
Problems in which the number of replicates is the same for each treatment Can use a complete factorial experiment where the number of treatments is ab with r replicates per treatment Variation caused by the treatments is measured by SST To determine whether the differences result from factor A, factor B, or some interaction between the two factors, we need to partition SST into the three sources These are SS(A), SS(B), and SS(AB)
Required conditions o The distribution of the response is normally distributed o The variance for each treatment is identical o The samples are independent Table 14.8: ANOVA Table for the Two-Factor Experiment