Statistical tests •
•
give probabilities only ◦ you still have to interpret the results, decide on accepting/rejecting a value, especially for small data sets (i.e. apply good judgement) may be helpful for questions: ◦ should an outlying value be rejected or retained in calculation of the mean? ◦ are 2 samples, analysed by the same method, significantly different in composition? ◦ does a difference in precision exist between 2 data sets from different workers/methods?
Rejection/retention of outliers: Grubbs test 1. Compute the average, x , and standard deviation, s. 2. Gexp = xq – x /s (mean includes questionable value) → xq = questionable result 3. Get Gcrit from Table 4-5 4. Apply test: reject if Gexp > Gcrit Example: Fe in large mocked soil sample %Pb =1.1, 3.4, 6.9, 3.9, 2.7, 2.8, 1.2, 2.3 • should the 6.9 value be rejected? • x = 3.0; s =1.9 • Gexp = 6.9 – 3.0 /1.9= 2.05 • Gcrit for n=8 is 2.032 • Reject because Gexp> Gcrit • BLIND APPLICATION OF TESTS IS DANGEROUS. USE GOOD COMMON SENSE. In case of outliers: • re-examine all data relating to the outlying value: properly kept lab notebok • if possible, estimate the precision expected from the procedure, to check if the outlying value is actually questionable. • repeat the analysis if sufficient sample and time are available. • otherwise, apply Grubbs test. • if retention is indicated, consider reporting the median rather than the mean Example Police have a hit-and-run case and need to identify the brand of red auto paint. The percentage of iron oxide, which gives paint its red color, found during analysis is as follows: 43.15, 43.81, 45.71, 43.23, 42.99, and 43.56%. What is the average percentage of the iron oxide in the paint sample? a. 43.3 b. 43.7 c. 43.6 Example answer: Grubbs test Gcalc > Gcrit → reject 45.71 42.99, 43.15, 43.23, 43.56, 43.81, 45.71 % Recalculate mean = 43.3 (s=0.3) Mean= 43.74 s=1.0 → decrease in s indication of outlier Gcalc = (45.71-43.74)/(1.0) = 1.97 Gcrit = 1.822
Hypothesis testing • "null hypothesis" assumes that quantities compared are the same unless proven otherwise by testing at a given probability level: ◦ comparison of mean from analysis with µ certified value of SRM...) ◦ comparison of the means of 2 different sets ◦ comparison of standard deviations from 2 sets of data ◦ comparison of individual differences between two data sets Comparison of mean with an accepted value: Student’s t test • mean - μ = ± ts /√(n) • if |mean-μ| > ts/ √(n) ◦ null hypothesis wrong ◦ determinate error at this confidence level • another way is to calculate tcalc and compare it to tcrit from Table 4-2 ◦ tcalc = √n x |mean-μ|/s ◦ if tcalc > tcrit ▪ Null hypothesis wrong at this confidence level Example: New method for the determination of sulfur in kerosenes • data: % S = 0.112, 0.118, 0.115, 0.119 • true = µ = 0.123% S • is there a determinate error? • mean= 0.116% S, s=0.003% • tcalc = √4 x |0.116-0.123|/0.003 = 4.7 • tcrit for 3 degrees of freedom = 3.182 at 95% confidence level • tcalc > tcrit : systematic error indicated • At 99% level, tcrit= 5.84: null hypothesis holds Q: A systematic error a) can be discovered and corrected. b) arises from the limitations on the ability to make a physical measurement. c) is also known as an indeterminate error. Example 2: Hg in mocked soil sample (small sample) • data: % Hg= 7.5 ± 6.6 • True = µ = 5.86% Hg • Is there a determinate error? • tcalc = √8 |7.5-5.86|/6.6 = 0.548 • tcrit for 7 degrees of freedom = 2.365 at 95% • confidence level • tcalc < tcrit : no systematic error indicated
Choice of confidence level • if too severe (99.9%) ◦ significant effect may be missed • if too relaxed (50%) ◦ insignificant effect may be judged important • In general ◦ result at 95% confidence level: SIGNIFICANT ◦ result at 99% confidence level: HIGHLY SIGNIFICANT Comparison of the precision of two sets: F test • Fcalc = s12/s22 ◦ s12=variance of set 1 ◦ s22=variance of set 2 ◦ s12 > s22 (always put larger variance in numerator) • compare to Fcrit (Table 4-4) • if Fcalc > Fcrit: significant difference Example of Hg in mocked soil sample: large vs small • Small sample: 7.5 ± 6.6 %Hg, n=8, D.F.=7 • Large sample: 6.1 ± 3.0 %Hg, n=8, D.F.=7 • Fcalc = s12/s22 =6.62/3.02 = 4.84 • Fcrit for 7 D.F. for s1 and 7 D.F. for s2=3.79 • Fcalc > Fcrit • significant difference in precision between the two data sets at 95% confidence level • •
if there is no significant difference → formula #1 if there is a significant difference → formula #2
Example of Hg vs Fe(II) in mocked soil sample (large sample) • Hg: 6.1 ± 3.0 %Hg, n=8, D.F.=7 • Fe(II): 6.4 ± 1.8 %Fe(II), n=8, D.F.=7 • Fcalc = s12/s22 =3.92/1.82 = 2.6 • Fcrit for 7 D.F. for s1 and 7 D.F. for s2=3.79 • Fcalc < Fcrit • No significant difference in precision between the two data sets at 95% confidence level Another example Police have a hit-and-run case and need to identify the brand of red auto paint. What statistical test might they perform? a. Grubbs test b. t test c. F test F test:: “significant difference in precision” T test:
Comparison of 2 experimental means (with similar s) • Hg: 6.1 ± 3.0 %Hg, n=8, D.F.=7 • Fe(II): 6.4 ± 1.8 %Fe(II), n=8, D.F.=7
• • •
tcrit for D.F. 8+8-2=14 is 2.150 tcalc < tcrit No significant difference between the 2 means at the 95% confidence level 2.47
Comparison of 2 experimental means (with different s) • Perform a t test: calculate t from data and compare to tcrit (TABLE 4-2)
•
compare tcalc to tcrit for D.F. =
•
if tcalc > tcrit → significant difference between the 2 means
Example 2: Hg in mocked soil sample: large vs small • Small sample: 7.5 ± 6.6 %Hg, n=8, D.F.=7 • Large sample: 6.1 ± 3.0 %Hg, n=8, D.F.=7
• • •
tcrit for 11 degrees of freedom= 2.209 tcalc < tcrit No significant difference between the 2 means at the 95% confidence level
Paired t test for comparing individual differences • Compute the differences between independent single results, the average of these differences and its associated standard deviation. ∣d∣ • Perform a t test: tcalc = n s d ▪ where|d| is the absolute value of the mean difference between results ▪ s = standard deviation associated with |d| ▪ n = number of data pairs compared
Example: comparison of 2 methods to measure %Fe • |d| =1.7%Fe • sd= 2.9%Fe ∣d∣ 1.7 • tcalc= n s = 7 2.9 =1.6 d • tcrit for 6 DF= 2.447 • tcalc< tcrit • no significant difference at 95% confidence level One final example Titrator A obtains a mean value of 12.96% and a standard deviation of 0.05 for 5 replicate analyses of a sample. Titrator B obtains corresponding values of 13.12% and 0.08 for 7 replicate analyses. The true percent purity is 13.08. At the 95% confidence level, compared to titrator B, titrator A is: a. as accurate and as precise b. less accurate but as precise c. as accurate but more precise d. less accurate but more precise Answer A mean=12.96% sA=0.05 n=5 B mean=13.12% sB=0.08 n=7 µ= 13.08 Qualitatively, sA< sB: A appears more precise than B e=|mean- µ|= 0.12 for A vs 0.04 for B; hence B appears to be more accurate than A BUT DIFFERENT N VALUES REQUIRE: → f test for difference in precision → student's t test for difference vs true value f test: Fcalc = s12/s22 = (0.08)2 / (0.05)2 = 2.56 Fcrit for 6 degrees for s1 and 4 DF for s2 of freedom: Fcrit = 6.16 Fcalc < Fcrit → no significant differencex true value: tcalcl = (n)1/2 |mean-μ| /s for each: A tcalc = 5.37 B tcalc = 1.32
tcrit for 4DF = 2.776 → tcalc > tcrit so significant determinate error tcrut for 6DF = 2.447 → tcalc < tcrit no determinat error
Order: 1. Grubb's test 2. F test → compare precision! Doesn't say anything about mean 3. Student's t test ◦ if no significant difference in F test: formula #1 (similar s) ◦ if is significant difference in F test: formula #2 (different s)