ANALYSIS OF VARIANCE (ANOVA) INTRODUCTION • •
•
More than two groups (J > 2), e.g. treatment vs. standard treatment vs. control We could compare each pair, but running multiple t-‐tests inflates the Type I error rate o I.e. if one group is “atypical” (not representative, produces an affect that doesn’t actually exist in the population) by chance, this produces two type I errors o ANOVA simultaneously compares J means in one omnibus test, keeping Type I error rate to α o Assumptions are the same as for t-‐test “One-‐way” analysis of variance " just one independent variable
HYPOTHESES •
•
Null hypothesis: H0: µ1 = µ2 = µ3 = … = µJ (for J groups, or levels, J ≥ 2) o “Nothing systematic happening”, samples are from the same population, no differential treatment effects, all means are equal, no differences in the dependent variable as a function of the IV Alternate hypothesis: HA: not H0 o At least one mean is different (not directional)
VARIANCE •
•
We want to compare means, so why analysis of variance? o Why do scores vary? i.e., why does Person A in Group 1 have different score to Person B in Group 2? o 1. Because they are experiencing different “treatments” – between groups o 2. For other reasons not accounted for – between groups and within groups Within group variability o Individual differences, “random” error, basically, any variable not accounted for in the analysis o Because of the assumption of homogeneity of variance, we estimate within-‐group variability using pooled variance…
=
•
j
N−J
o = the estimate of common population variance based on variability within groups o SSW = the sum of all the SS (comparing each score to its group mean) o The degrees of freedom within groups: dfW = N – J Between group variability o Group means vary around the grand mean (just as raw scores vary around group means)
Grand mean = X G = o
sum of all scores N
Variance between means = sum of squared deviations of group means from grand mean
SSB MSB = dfB
•
∑ SS
n ∑ (X J − X G )
2
=
J −1
o SSB = n times the sum of squared deviations of group means from grand mean (comparing each group mean to the grand mean) o The degrees of freedom between groups: dfB = J – 1 o If all means are equal, MSB = 0, F = 0 Variability if H0 is true / not true o If the population means are different (i.e., H0 is not true), the sample means will still differ because of sampling variability, but also because of systematic group differences o If H0 is true, MSB = error variance (and should be about the same as MSW) o If H0 is not true, MSB = error variance + treatment effect (and should be greater than MSW)
F RATIO •
Calculating the F ratio is our goal:
F=
• • •
MSB between group variance treatment + error = = MSW within group variance error
o If H0 is true, then treatment = 0 (F = 1) o If H0 is not true, then treatment >0 (F >1) The F tables: we now have two degrees of freedoms o 5 percent points = 0.05 (use this one unless told otherwise), 1 percent points = 0.01 Compare observed F to critical F o If observed F > critical F, reject H0 Anova Summary Table:
EFFECT SIZE •
We have multiple means, so effect size is sums of squares (between) divided by sums of squares (total):
η2 =
SSB SST
CALCULATING DEGREES OF FREEDOM •
Number of observations minus number of constraints, e.g. F(5, 60) o J – 1 = 5 " so J = 6 o N – J = 60 " so N = 66 o Therefore there are 66 participants and 6 different groups o If n’s are equal, there are 11 participants per group
MULTIPLE COMPARISONS •
•
•
•
Omnibus test: o Observed F = 6.28 > Critical F = 2.84, Reject H0 o Conclusion: Amount of bacteria retained appears to depend on whether and how a beard is washed, F(3, 44) = 6.28, p < .05. o The Type I error rate is .05 because there is only one inference, i.e., that the group means are not all the same. This is unsatisfactory. Where are the differences? How can we test other differences without inflating the Type I error rate? o " we use multiple comparisons Tukey’s HSD (Honestly Significant Difference) o Test all pairwise comparisons while maintaining the Type I error rate at the specified level o e.g., the experiment gets an α of .05 to share (experiment-‐wise error rate, EER), rather than each decision getting .05 (decision-‐wise error rate, DER) o Conclusion: Pairwise comparisons using the Tukey HSD procedure at the .05 level revealed that bacteria count was significantly lower for no beard group when compared with the other three groups. Bacteria count after splash wash, shower stream wash and no wash did not differ significantly from each other. Post hoc tests o q, the studentised range statistic o Considers how often the largest difference between means would be significant if omnibus H0 is true o Not necessarily “post hoc” Omnibus test versus multiple comparisons; it is possible that: o the omnibus test is significant but no pairwise comparisons are o the omnibus test is non-‐significant but some pairwise comparisons are
FACTORIAL ANOVA TWO-‐WAY ANOVA • • •
•
•
•
Factorial = every possible combination of levels Example: what might determine a client’s satisfaction with therapy? Maybe type of therapy, maybe the experience of the therapist, or maybe the combination of these two. A two-‐way ANOVA has 2 independent variables o In a 2 × 2 design (4 groups in total), each independent variable has 2 levels – but two-‐way ANOVAs can also be 2 x 3, 4 x 6, etc. – there are J x K groups in total In a two-‐way ANOVA, there will be three effects " three hypotheses: 1. The effect of one IV (e.g., type of therapy) on the DV, e.g. “is there a difference in client satisfaction between group and individual therapy?” 2. The effect of the other IV (e.g., therapist experience) on the DV, e.g. “is there a difference in client satisfaction between new and experienced therapists?” 3. The interaction between the two IVs, e.g. “does the difference between new and experienced therapists depend on type of therapy?” OR “does the difference between group and individual therapy depend on therapist experience?” Interaction indicates that the influence of one IV on the DV changes accordingly to the level of another IV o Independent of the main effects o It is different from the sum of the individual effects; it qualifies them Estimating the components of variance (partitioning variance):
• • •
Critical F is based on df(effect) (separate for each effect) and df(error) (the same for all effects) Effect size: accounted for " SS(A)/SST + SS(B)/SST + SS(AxB)/SST (unaccounted for " SSW/SST) Advantages: o ANOVA can find interaction effects o ANOVA is more economical (regarding sample size) o ANOVA can account for more variability of the DV ! Suppose we had just looked at the effects of one IV (one-‐way ANOVA) with the same overall means for the DVs. Total SS and SS for the effect is still the same, so η2 for this effect is still the same ! But now within-‐group variability is a lot higher because the other IV and the interaction accounted for a lot of the variance, and is now unaccounted for; is considered error variance ! Larger MSW (within-‐group variance/individual differences/error) = less power
A person’s score
THE M ANOVA MODEL THE ANOVA ODEL: EXAMPLE A person’s score
Overall mean
=
Any systematic group effect
+
Overall mean
=
Yij = µ + Score on DV for ith person in the Individual jth group error
j
Grand mean
DV SCORE Yij
+
Any systematic group effect
+
+
Individual error
+
ij
OMNIBUS F TES
Effect parameter for group j
Random error for ith person in the jth group
Yij = µ +
Yij = µ + Score on DV for ith person in the jth group
Grand mean
j
+
i Effect parameter 1 for group j J
ij
j
+
ij
j=1 j=2 Random error for ith person in 1 15 the jth group 7
16
are there syst
j 1 Simply because:
2
+
J
Important 0 in ANOVA F=test All we have to do to calculate the DV score (which is usually the only data feature: we have) Omnibus is to addjeach part of the ANOVA model together: Yij = µ +
j
= µj - µ
Ho:
j=3 15
If there are NO systematic effects of IV o
j
1
=
2
=
3
= …. =
J
=
If there are systematic effects of IV on D
13
H : not H OR means not all
o If there are NO systematic effects of IV on DV, then all αj = 0 ! Ho: α1 = α2 = α3 =....= αJ = 0 A Important feature: 3j =0 11 14 10 If there are systematic jeffects of IV on DV, then NOT all αj = 0 ! HA: not Ho OR means not all equal OR αj not all 0 1 4 9 possible treatments 11 10 EXAMPLE: So, because: imagine tjhat are three of anxiety (no treatment, CBT, and group Simply = µtj here -µ In order to make inferences about these two possibilities, we calculate an observed F therapy). Our DV is ‘freedom from anxiety’ (where higher score = less anxiety) and we know that: ratio, 7 14 which is the ratio 12 of the between=groups 11 variability to within groups variability o 1) No treatment increases anxiety by 5 points ! α1 = -‐5 o 2) CBT decreases anxiety by 3 points ! α2 = We 3 can then calculate the probability of obtaining this F ratio, if Ho is true Note that these data look like more realistic psychological there is variability both o 3) Group therapy decreases anxiety by 2 points ! α3 = 2data because If p value > , then we accept Ho within ( ij ) and between groups (both j and ij ) • Let’s say we also know the population mean freedom from anxiety is 10 points: μ = 10. Therefore: If p value < , then we reject Ho o μ1 = μ + α1 = 10 + (-‐5) = 5 o μ2 = μ + α2 = 10 + 3 = 13 There are a number of steps involved in calculating the F ratio, that can be o μ3 = about μ + αthese + 2possibilities, = 12 3 = 10 summarised in the following table: In order to make inferences two we calculate an observed F To elaborate on this and see how we can te ratio, which thew ratio the between groups within variability • isBut ith of no error term, this variability means ttohat all igroups ndividuals within a Source group score SS the same. df MS F with an omniscient example where we kno • Error w as b uilt i nto t he A NOVA m odel t o m ake t he m odel r ealistic ( Y = μ + α + ε ). A ssumptions: ij j ij We can then calculate the probability of obtaining this F ratio, if Ho is true Y Y Between SSB = n dfB = J 1 MSB = SSB/dfB F = MSB/MSW Note that this never happens in real life…in o 1) Errors are independent If N p ormally value > d,istributed then we accept Ho estimate theMSW systematic effects = o 2) Y Y Within SSW = dfW = J(n 1) SSW/dfW o 3) a m<ean of zero, i.e. ΣH (εoij) = 0 If Hpave value , then we reject But, starting with an example where we kn Y Y o 4) Homogeneity of variance (homoscedasticity) Total SST = dfTexactly = N 1 how the ANOVA model works There are a number steps involved theisFaratio, When we have access o of Which can bine calculating sall ummarised s: εijthat ~to Ncan scores (0, beσ2εon ) the dependent variable, in order to get to a summarised in the following table:position where we can make inferences about whether or not there are systematic • When all we have is access to scores on the DV, in order to get to a position where we can make inferences Source SS df we first need MS to estimate Fthe population parameters effects, about whether or not there are systematic effects, we first need to estimate the population parameters
• • •
OMNIBUS F TEST (ANOVA)
OMNIBUS F TEST (ANOVA)
AN EXAM
J
j
2
j
j 1
n
ESTIMATES
J
2
ij
j
i 1j 1
n
J
2
ij
i 1j 1
J
Between
SSB = n j n
Within
Yj
Y
2
j 1 J
SSW =
Yij Yj
2
i 1j 1
n
Total
SST =
J
Yij Y i 1j 1
2
J 1 the MSB = SSB/dfB MSB/MSW TodfB do=this, ANOVA ModelF =can be re expressed in terms of variability Yij = j ij MSW dfW = J(n 1) (YSSW/dfW = (µj µ) + (Yij µj) ij µ) dfT = N 1 Total variability
Systematic between group variability
Unaccounted for within group variability
The population mean, µ, and group means, µj can be estimated from observed data = grand sample mean
j
= sample group mean
So we can obtain estimates of the population variability via: Yij eij j (Yij ) = ( j ) + (Yij j) Total variability
• • •
Systematic between group variability
Unaccounted for within group variability
But, deviation scores sum to zero, so these scores need to be squared first. We can then work out SST, SSB and SSW using the formulas We can then use these SS to variance estimates (mean squares, MS) by dividing by the relevant degrees of freedom, i.e. SS/df