Chapter 14: Analysis of Variance

Comment

Report 3 Downloads 59 Views

Chapter 14: Analysis of Variance Analysis of variance   

Determines whether differences exist between population means Procedure works by analyzing the sample variance Population means = treatment means

14.1: One-Way Analysis of Variance   

Procedure that tests to determine whether differences exist between two or more population means Technique analyzes the variance of the data to determine whether we can infer that the population means differ Experimental design is a determinant in identifying the proper method to use

One-way analysis of variance  

Procedure to apply when the samples are independently drawn Figure 14.1: Sampling Scheme for Independent Samples page 526

  

Confirm that the data are interval and that the problem objective is to compare populations Parameters are the four* population means: and Null hypothesis will state that there are no differences between the population means o Analysis of variance determines whether there is enough statistical evidence to show that the null hypothesis is false Alternative hypothesis will always specify the following: o Next step is to determine the test statistic

  

Response variable 

The variable X is called the response variable

Responses 

The values of the response variable X

Experimental unit



The unit we measure is called the experimental unit

Factor  

The criterion by which we classify the population Each population is called a factor level

Test Statistic  

If null hypothesis is true, the population means would all be equal and we would then expect the sample means to be close to one another If the alternative hypothesis is true, there would be large differences between some of the sample means

Between-treatments variation  

Measures the proximity of the sample means to each other Denoted SST (sum of squares for treatments)

Sum of squares for treatments (SST)

  

If the sample means are close to each other, all of the sample means would be close to the grand mean; as a result SST would be small SST achieves its smallest value(zero) when all the sample means are equal A small value of SST supports the null hypothesis

Steps   

Compute the sample means ( Compute the sample sizes ( Then calculate SST



If large differences exist between the sample means, making SST a large value, it is reasonable to reject the null hypothesis Important question becomes: how large does the statistic have to be for us to justify rejecting the null hypothesis?



and the grand mean (X double bar) and the total of the sample sizes to get n



To answer question, important to know how much variation exists in response variables

Within-treatments variation  

Provides a measure of the amount of variation in the response variable that is not caused by the treatments Denoted by SSE (sum of squares for error)

Sum of squares for error (SSE)

 

Each of the k components of SSE is a measure of the variability of that sample If we divide each component by , we obtain the sample variances. We can express this be rewriting SSE as:

 

SSE is the combined or pooled variation of the k samples A condition for SSE requires that the population variances be equal o

Steps  

Calculate the sample variances ( Calculate SSE



Next step is to compute quantities called the mean squares

Sampling Distribution of the Test Statistic 

Test statistic is F-distributed with k-1 and n-k degrees of freedom provided that the response variable is normally distributed

Steps   

Calculate degrees of freedom Calculate MSE and MST Calculate test statistic F

Rejection Region and p-Value     

Purpose of calculating the F-statistic is to determine whether the value of SST is large enough to reject the null hypothesis If SST is large, F will be large We reject the null hypothesis only if: o Results of the analysis of variance are usually reported in an ANOVA table Table 14.2: ANOVA Table for the One-Way Analysis of Variance page 532



If SST explains a significant portion of the total variation, we conclude that the population means differ

Completely randomized design 

When the data are obtained through a controlled experiment in the one-way analysis of variance, we call the experimental design the completely randomized design of the analysis of variance

Checking the Required Conditions    

The F-test of the analysis of variance requires that the random variable be normally distributed with equal variances Normality requirement is easily checked graphically by producing the histograms for each sample The equality of variances is examined by printing the sample standard deviations or variances The similarity of sample variances allows us to assume that the population variances are equal

Violation of the Required Conditions  

If the data are not normally distributed, we can replace the one-way analysis of variance with its nonparametric counterpart, which is the Kruskal-Wallis Test If the population variances are unequal, we can use several methods to correct the problem

Can We Use the t-Test or the Difference between Two Means Instead of the Analysis of Variance?   





Analysis of variance test determines whether there is evidence of differences between two or more population means The t-test of determines whether there is evidence of a difference between two population means The question arises, can we use t-tests instead of the analysis of variance? In other words, instead of testing all the means in one test as in the analysis of variance, why not test each pair of means? There are two reasons why we don’t use multiple t-tests instead of one F-test: o We would have to perform many more calculations o Conducting multiple tests increases the probability of making Type I errors When we want to compare more than two populations of interval data, we use the analysis of variance

Can We Use the Analysis of Variance Instead of the t-Test of the Difference between Two Means?  

If we want to determine whether is greater than we cannot use the analysis of variance because this technique allows us to test for a difference only If we want to test to determine whether one population mean exceed the other, we must use the t-test



Moreover, the analysis of variance requires that the population variance are equal and if they are not, we must use the unequal variances test statistic

Relationship between the F-Statistic and the t-Statistic 

If we square the quantity of the t-statistic, the result is the F-statistic o

Developing an Understanding of Statistical Concepts   

The F-test of the independent samples’ single-factor analysis of variance is an extension of the ttest of If we simply want to determine whether a difference between two means exists, we can use the analysis of variance Advantage of using the analysis of variance is that we can partition the total sum of squares, which enables us to measure how much variation is attributable to differences between populations and how much variation is attributable to differences within populations

14.3: Analysis of Variance Experimental Designs   

Experimental design has been one of the factors that determines which technique we use One-way analysis of variance is only one of many different experimental designs of the analysis of variance For each type of experiment, we can describe the behaviour of the response variable using a mathematical expression or model

Single-Factor and Multifactor Experimental Designs  

The criterion by which we identify populations is called a factor Experiment described in section 14.1 is a single-factor analysis of variance because it addresses the problem of comparing two or more populations defined on the basis of only one factor

Multifactor experiment 

One in which two or more factors define the treatments

Independent Samples and Blocks Randomized block design

     

When the problem objective is to compare more than two populations, the experimental design that is the counterpart of the matched pairs experiment is called the randomized block design Term block refers to a matched group of observations from each sample The experimental design should reduce the variation in each treatment to make it easier to detect differences Can also perform a blocked experiment by using the same subject (person, plant, and store) for each treatment – called repeated measures design Treat repeated measures designs as randomized block designs Randomized block experiment is also called two-way analysis of variance

Fixed and Random Effects Fixed-effects analysis of variance 

If our analysis includes all possible levels of a factor

Random-effects analysis of variance  

If the levels included in the study represent a random sample of all the levels that exist In some experimental designs, there are no differences in calculations of the test statistic between fixed and random effects

14.4: Randomized Block (Two-Way) Analysis of Variance  



The purpose of designing a randomized block experiment is to reduce the within-treatments variation to more easily detect differences between the treatment means In the one-way analysis of variance, we partitioned the total variation into the betweentreatments and within-treatments variation; that is, o SS(Total)=SST + SSE In the randomized block design of the analysis of variance, we partition the total variation into three sources of variation, o SS(Total)=SST+SSB+SSE

Sum of squares for blocks (SSB)    

Measures the variation between the blocks When the variation associated with the blocks is removed, SSE is reduced, making it easier to determine whether differences exist between the treatment means The calculations for this experimental design will be calculated using Excel only To help you understand the formulas, we will use the following notation:



Table 14.4: Notation for the Randomized Block Analysis of Variance



The definitions of SS(Total) and SST in the randomized block design are identical to those in the independent samples design SSE in the independent samples design is equal to the sum of SSB and SSE in the randomized block design





Test is conducted by determining the mean squares, which are computed by dividing the sums of squares by their respective degrees of freedom

  

An interesting, and sometimes useful, by-product of the test of the treatment means is that we can also test to determine whether the block means differ Will allow us to determine whether the experiment should have been conducted as a randomized block design Test of the block means is almost identical to that of the treatment means except the test statistic is: o

 

F-distributed with and degrees of freedom Table 14.5: ANOVA Table for the Randomized Block Analysis of Variance

Checking the Required Conditions 

The F-test of the randomized block design of the analysis of variance has the same requirements as the independent samples design

 

The random variable must be normally distributed and the population variances must be equal Equality of variances requirement also has to be met

Violation of the Required Conditions 

When the response is not normally distributed, we can replace the randomized block analysis of variance with the Friedman test

Criteria for Blocking  

Purpose of blocking is to reduce the variation caused by differences between the experimental units By grouping the experimental units into homogenous blocks with respect to the response variable, the statistics practitioner increases the chances of detecting actual differences between the treatment means

Developing an Understanding of Statistical Concepts   

Randomized block experiment is an extension of the matched pairs experiment In the randomized block experiment of the analysis of variance, we actually measure the variation between the blocks by computing SSB The sum of squares for error is reduced by SSB, making it easier to detect differences between the treatments

14.5: Two-Factor Analysis of Variance Factorial experiment   

Can examine the effect on the response variable of two or more factors Can use analysis of variance to determine whether the levels of each factor are different from one another Will use only Excel to calculate

Complete factorial experiment  

An experiment in which the data for all possible combinations of the levels of the factors are gathered Will refer to one of the factors as factor A (arbitrarily chosen)

 

Number of levels of this factor will be denoted by a The other factor is called factor B and its number of levels is denoted by b

Replicate  

Number of observations for each combination Number of replicates is denoted by r

Balanced     

Problems in which the number of replicates is the same for each treatment Can use a complete factorial experiment where the number of treatments is ab with r replicates per treatment Variation caused by the treatments is measured by SST To determine whether the differences result from factor A, factor B, or some interaction between the two factors, we need to partition SST into the three sources These are SS(A), SS(B), and SS(AB)





Required conditions o The distribution of the response is normally distributed o The variance for each treatment is identical o The samples are independent Table 14.8: ANOVA Table for the Two-Factor Experiment

Recommend Documents

Chapter 9 Analysis of Variance (ANOVA) AWS

PSYC202 Chapter 13: Introduction to Analysis of Variance 13.1 ...

Chapter 9 Analysis of Variance (ANOVA) - HHP Communities Website

Variance Analysis - Teibto

Chapter 7 â Static Budget, Flexible Budget and Variance Analysis ...