Final Exam Crib Sheet

Report 3 Downloads 220 Views
MGCR 271 Crib Sheet By Kareem Halabi Z-score: 𝑧 =

π‘₯βˆ’π‘₯ 𝑠π‘₯

(if |𝑧| > 3 then the number

is an outlier). All the z scores of a data set will have π‘₯ = 0, 𝑠π‘₯ = 1 Simple Regression: Residual for (π’™π’Š , π’šπ’Š ): 𝑒𝑖 = 𝑦𝑖 βˆ’ π‘Ž βˆ’ 𝑏π‘₯𝑖 Ordinary Least Squares regression: Goal is to minimize βˆ‘ 𝑒𝑖 2 𝑦 = π‘Ž + 𝑏π‘₯ 𝑏=

𝑛(βˆ‘ π‘₯𝑦)βˆ’(βˆ‘ π‘₯)(βˆ‘ 𝑦) 𝑛(βˆ‘ π‘₯ 2 )βˆ’(βˆ‘ π‘₯)

2

π‘Ž=

(βˆ‘ 𝑦)βˆ’π‘(βˆ‘ π‘₯)

The confidence interval is found by 𝑧 z is the z-score of the area of

𝑠π‘₯ βˆšπ‘›

1βˆ’πΆπΏ 2

where Similar to normal distribution but has fatter tails. Different sample size lead to slightly different shapes for t-distribution

Central Limit Theorem: A random sample of df: Degrees of Freedom = n-1 size n β‰₯ 30 is selected from an infinite (or large) population distributed with mean ΞΌ and 𝑑 = π‘₯βˆ’πœ‡ NEVER use Οƒ 𝑠π‘₯ /βˆšπ‘› standard deviation Οƒ. Let π‘₯ be the mean of this random sample; even if the parent population Confidence formula using t: π‘₯Μ… Β± 𝑑 𝑠π‘₯ βˆšπ‘› is not normally distributed π‘₯ is approximately π‘₯βˆ’πœ‡ π‘₯βˆ’πœ‡ Sample Size Formulas (Always use z): normally distributed. (rare), 𝜎/βˆšπ‘›

𝑠π‘₯ /βˆšπ‘›

(frequent) have the same distribution as z Binary Data: Suppose a very large dataset comprises only 1s and 0s

1.

𝐸= 𝑧

𝑛

Normal Distributions: Probability of one specific outcome of a continuous random variable z is 0%

𝝁 = 𝒑 where p is the proportion of 1s

For point estimate: π‘₯Μ… Β± 𝑧

2.

𝑠π‘₯ βˆšπ‘›

→𝒏=(

𝑠π‘₯ βˆšπ‘›

𝒛𝒔𝒙 𝟐 𝑬

) always round up

For binary data: 𝑝̅ Β± π‘§βˆš

𝑝̅ (1βˆ’π‘Μ… ) 𝑛

𝝈 = βˆšπ’‘(𝟏 βˆ’ 𝒑) 𝑝̅ (1βˆ’π‘Μ… )

𝑛𝑝̅ (1βˆ’π‘Μ… )

For a sample of size n, 𝑠π‘₯ = √

𝐸 = π‘§βˆš

𝑛

→𝒏=

Μ… (πŸβˆ’π’‘ Μ…) π’›πŸ 𝒑 π‘¬πŸ

(round up)

π‘›βˆ’1

Use Cumulative Distribution Function (CDF) and Conservative sample size: If we don’t know 𝑝̅ 𝑝̅ βˆ’π‘ 𝑝̅ βˆ’π‘ density function for a range of values Ex: 𝑧 β‰ˆ Μ… (1βˆ’π‘Μ…) 𝑂𝑅 where 𝑝̅ is the average 𝑝 𝑝(1βˆ’π‘) use the worst case scenario of 0.5 and (1-0.5) √ √ CDF 𝐹(π‘₯) = 𝑃(𝑧 < 1) π‘›βˆ’1 𝑛 of a sample of size n Μƒ(πŸβˆ’π’‘ Μƒ) π’›πŸ 𝒑 Density function = If there is 𝑝̃, a prior estimate of p: 𝒏 = π‘¬πŸ derivative of CDF Hypothesis-Testing Procedure (keywords p-Value Hypothesis Testing: β€œsignificant”, β€œstatistically significant”): Shape of CDF is Same procedure as regular hypothesis Sigmoid Curve: 1. Form appropriate null and alternative hypotheses. H0 by convention has β‰₯, ≀ or = testing except test statistic is the smaller Shape of density area under distribution. Find z/t/F/Ο‡2 score, whereas Ha has or β‰  function is a bell 2. Determine the appropriate distribution (z, then look up area (this will be p-value) π‘₯2 1 βˆ’ curve: 𝑒 2 t, Ο‡2 or F) 1. For a right/left tailed test reject 𝐻0 if 𝑝 < 𝛼 √2πœ‹ 3. Calculate the appropriate test statistic 2. For a two tailed test, reject 𝐻0 if 2𝑝 < 𝛼 Probabilities are found as areas under the 4. Using the level of significance Ξ± (by default density curve. First convert data to z-scores, 2-Sample Tests (used when comparing the 0.05) and tables of distribution to form then look up areas in table. means of two samples if n β‰₯30): rejection region. a. 𝐻0 : πœƒ ≀ πœƒ0 π»π‘Ž : πœƒ > πœƒ0 Right tailed 𝑝̅ βˆ’π‘Μ… For percentiles, use the table backwards, treat 𝑧 = 𝑝̅ (1βˆ’π‘Μ…1 ) 𝑝̅2 (1βˆ’π‘Μ… ) (unless using a pooled test. Tail area = Ξ±. Reject H0 if test 1 1 + 2 2 √ the percentile as an area then lookup the z 𝑛1 𝑛2 statistic is > critical value from table score and convert to a data value proportion b. 𝐻0 : πœƒ β‰₯ πœƒ0 π»π‘Ž : πœƒ < πœƒ0 Left tailed test. Tail area = Ξ±. Reject H0 if test statistic is 𝑧 = π‘₯Μ…1 βˆ’π‘₯Μ…2 (except when using the pooled If a random sample of size n is drawn from a 𝑠 2 𝑠 2 π‘₯βˆ’πœ‡ √ 1 + 2 < critical value from table normal population, then has the same 𝑛1 𝑛2 𝜎/βˆšπ‘› c. 𝐻0 : πœƒ β‰  πœƒ0 π»π‘Ž : πœƒ = πœƒ0 Two-tailed variance aka ANOVA) distribution as z. (π‘₯ is the mean of the sample) test. Tail areas = Ξ±/2. Reject H0 if test 𝑠 𝑠 2 𝑠 2 statistic is > than right hand critical Standard error of sample mean: π‘₯ Standard error for π‘₯Μ…1 βˆ’ π‘₯Μ…2 is √ 1 + 2 βˆšπ‘› 𝑛1 𝑛2 value or < than left hand critical value 5. State the conclusion in ordinary language For a t-distribution (n < 30): Estimation: The use of a single number calculated from a sample is called point estimation (statistic). Unfortunately, it is rarely equal to the parameter it is estimating An interval estimate has a margin of error. Confidence Level: estimate of the probability that the confidence interval includes the true parameter value

Student’s t distribution: Theoretical Requirements: 1. Population should be approximately normally distributed 2. Sample should be randomly selected Useful for sample sizes n < 30

2 𝑠 2 𝑠 2

Satterthwaite DF: truncate result

T-statistic: 𝑑 =

π‘₯Μ…1 βˆ’π‘₯Μ…2 𝑠 2 𝑠 2 √ 1 + 1 𝑛1

𝑛2

( 𝑛1 + 𝑛2 ) 1 2

2 2 𝑠 2 𝑠 2 ( 𝑛1 ) ( 𝑛2 ) 1 2 + 𝑛1 βˆ’1 𝑛2 βˆ’1

Multiple Regression π’šΜ‚ = π’ƒπŸŽ + π’ƒπŸ π’™πŸ + β‹― + π’ƒπ’Œ π’™π’Œ : Ο‡2 (Chi-square) distribution:

ANOVA: used for testing whether several means are statistically different

T statistic = Theoretical Requirements: 1. 2. 3.

𝑠𝑏𝑗 is std error of 𝑏𝑗

Analysis of Variance for Regression:

The samples are taken from populations that are all normally distributed The samples are randomly and independently selected The variances of all the populations are roughly the same (homoscedasticity) 1

𝑏𝑗 𝑠𝑏𝑗

Source

DF

SS

MS

Regression

k

SSR

MSR

Error Total

n-k-1 n-1

SSE SSTOT

MSE

F 𝑀𝑆𝑅 𝑀𝑆𝐸

R2 (coefficient of determination) =

𝑆𝑆𝑅 𝑆𝑆𝑇𝑂𝑇

aka

proportion of the variation in y that is explained by the regression relationship. Most common measure of the accuracy a model

MSTR MSE

Compare to F distribution:

SSR (explained variation/regression sum of Numerator DF = number of treatments (k) – 1: squares) = R2 *SSTOT 𝑁𝐷𝐹 = π‘˜ βˆ’ 1 SSE (unexplained variation in y) = SSTOT-SSR Denominator DF = number of data points (n) – (βˆ‘ 𝑦)2 k: 𝐷𝐷𝐹 = 𝑛 βˆ’ π‘˜ SSTOT (total variation in y) = (βˆ‘ 𝑦 2 ) βˆ’ 𝑛

ANOVA Table Method (preferred method): Source

DF

SS

MS

𝑀𝑆𝑅 =

F 𝑀𝑆𝑇𝑅 𝑀𝑆𝐸

𝑆𝑆𝑅

𝑀𝑆𝐸 =

π‘˜

πœ’2 = βˆ‘

(π‘‚βˆ’πΈ)2 𝐸

where O are observed A

Example contingency table:

Error Mean Square (MSE): βˆ‘ 𝑠𝑖 (the average k = number of x variables 𝑛 of the square std deviations of the treatments) n = number of lines of data

ANOVA test statistic =

Chi-square independence test (keywords β€œindependent”, β€œdependent”, β€œdepends on”, β€œrelated to”

frequencies and E are expected frequencies

2

Treatment Mean Square (MSTR): 𝑛𝑗 𝜎(π‘₯̅𝑗 )2

πœ’ 2 β‰ˆ 𝑧1 2 + 𝑧2 2 + β‹― + 𝑧3 2 (never negative)

𝑆𝑆𝐸 π‘›βˆ’π‘˜βˆ’1

A1 𝐴1 𝐡1 𝑛 𝐴1 𝐡2 𝑛

A2 𝐴2 𝐡1 𝑛 𝐴2 𝐡2 𝑛

B1 ALWAYS: B H0: A and B are B2 independent Ha: A and B are significantly dependent Example of Expected Frequencies: DF = (rows – 1)(columns – 1)

A B

A1 O11 O21

B1 B2

A2 O12 O22

If the test statistic > chi-square value in table, reject null hypothesis Chi-squared goodness of fit test:

If two sets of data are given, one of which is a set of expected values and another set which F and p-values are used for predicting whether a are observed values, use the Ο‡2 statistic. Never Error n-k SSE MSE model is significant for predicting y Total n-1 SSTOT use proportions for the frequencies. If H0: the model is not significant for predicting y proportions are given, multiply by the total Ha: the model is significant for predicting y SS: Sum of squares: number of data points to get the frequencies 2 (Treatment sum of squares) A change in R between two models can be H0: The expected values are not significantly (sum of treatment)2 (sum of all)2 determined by examining the p-value of the 𝑆𝑆𝑇𝑅 = βˆ‘ βˆ’ # of data in treatment total # of data different from the observed values removed variable. If p < Ξ±, change is significant Ha: The expected values are significantly (Error sum of squares) 𝑆𝑆𝐸 = 𝑆𝑆𝑇𝑂𝑇 βˆ’ 𝑆𝑆𝑇𝑅 Marginal Contribution of a variable is its different from the observed values coefficient Power Test (probability that the null (Total sum squares) (sum of all)2 hypothesis will be correctly rejected): CI for marginal contribution: 𝑏𝑗 Β± 𝑑 𝑠𝑏𝑗 𝑆𝑆𝑇𝑂𝑇 = (sum of all squares) βˆ’ total # of data DF = Error DF Say we have Οƒ & n: MS: Mean square Prediction-Interval Formula (for simple H0: ΞΌ ≀ a Ha: ΞΌ >a but ΞΌ actually = b 𝑆𝑆𝑇𝑅 𝑆𝑆𝐸 regression): 𝑀𝑆𝑇𝑅 = 𝑀𝑆𝐸 = Treatments

π‘˜βˆ’1

k-1

SSTR

MSTR

Standard error of estimate = βˆšπ‘€π‘†πΈ

π‘›βˆ’π‘˜

H0: ΞΌ1 = ΞΌ2 = ΞΌ3 … (not significantly different) Ha: at least one is significantly different

1

(π‘Ž + 𝑏π‘₯0 ) Β± π‘‘βˆšπ‘€π‘†πΈ √1 + + 𝑛

(π‘₯0 βˆ’π‘₯Μ… )2 (βˆ‘ π‘₯)2 (βˆ‘ π‘₯ 2 )βˆ’ 𝑛

ANOVA Confidence Intervals (must be used if NOTE: 𝒕𝑫𝑫𝑭 = βˆšπ‘­πŸ,𝑫𝑫𝑭 (if F NDF = 1) context of question is ANOVA):

Power of this test = 𝑃(𝑧 >

π‘Žβˆ’π‘ 𝜎 β„βˆšπ‘›

+ 1.645)

Type I error: If H0 is true, but an unlucky choice of sample makes the statistician reject H0

Type II error: If H0 is false, but an unlucky choice of sample makes the statistician not Standard error of simple regression coefficient Turn Ξ± into t value with DF being Error DF (n-k) reject H0 𝟏. π‘₯̅𝑗 Β± π‘‘βˆš

𝑀𝑆𝐸 𝑛𝑗

𝑀𝑆𝐸

𝟐. π‘₯Μ…1 βˆ’ π‘₯Μ…2 Β± π‘‘βˆš

𝑛1

+

𝑀𝑆𝐸 𝑛2

𝑠0 = √

𝑀𝑆𝐸 (βˆ‘ π‘₯ 2 )βˆ’

(βˆ‘ π‘₯)2 𝑛

Significance for Variance: If one variance is 4 times another, the variances are significantly different