Soc202 Readings notes 2 Chapter 7 - Using Probability Theory to Produce Sampling Distributions (Focus on pages 206 - 215 (up to "Nominal Variables"); pages 219 – 222) Point Estimates Sampling error – the difference between the calculated value of a sample statistic and the true value of a population parameter (unknown) Point estimate – a statistic provided without indicating a range of error. (mean of a population) o There’s a variability in statistical outcomes from sample to sample Predicting Sampling Error Repeated sampling – drawing a sample and computing its statistics and then drawing a second sample, a third, a fourth, and so on. (learned that sampling is only a estimate) Symbols to represent population parameter for interval/ratio variables o UX = the mean of a population o OX = the standard deviation of a population Sampling error = Xbar – UX Sampling error is patterned and systematic and therefore is predictable o 1) The resulting sample means were similar in value and tended to cluster around a particular value. o 2) sampling variability was mathematically predictable form probability curves Sampling distribution From repeated sampling, a mathematical description of all possible sampling outcomes and the probability of each one A raw score distribution is scores of each person in your sample Sampling distribution for interval/ratio variables Example: Physicians o 48 years = mean of sample means o UX = mean of a sampling distribution of means, will always equal to population mean o As with any normal curve, standard deviation is the distance to the point of inflection of the curve o Sampling distribution of means (X bar) describes all possible sampling outcomes and the probability of each outcome. Standard Error Standard error – the standard deviation of a sampling distribution. The standard error measures the spread of sampling error that occurs when a population is sampled repeatedly o Measures the spread o Equation – the standard error of a sampling distribution of means is the samples standard deviation divided by the square root of the sample size (S(xbar) = sx/square root of N)
Law of large numbers The larger the samples size, the smaller the standard error Replacing n with higher number, sample error decreases The central limit Theorem Regardless of the shape of a raw score distribution of an interval/ratio variable, its sampling distribution will be normal when the sample size, n, is greater than 121 cases and will center on the true population mans. Sampling distribution has a small range than the raw score distribution (not a normal curve) Distributions are rectangular in shape when the raw scores have equal chances of getting picked Bean Counting as a Way of Grasping the Statistical Imagination Sampling distribution is a probability distribution, it tells us how frequently to expect any and all sampling out comes when we draw random samples The central limit theorem essentially states that random sampling results in normal curves: symmetrical distributions that bunch in the middle and tail out to the sides As long as sample size is sufficiently large, the sample distribution of proportions takes the bell shape of a normal curve Distinguishing Among Populations, Samples, and Sampling Distributions Sample = Statistics, Population = Parameter, Sampling distribution = Hypothetical distribution of an infinite number of samples of size N. Statistical Follies and Fallacies: Treating a point estimate as though it were absolutely true No single statistics is the last word on estimating a parameter of the population
Chapter 8: Parameter Estimation Using Confidence Intervals Focus on pages 237 - 256 (up to "choosing sample size"); also, skip "confidence intervals of the mean for small samples" paragraph on page 251
The statistics of a sample are estimates. In this chapter we learn to say confidently just how close this single point estimate is to the true parameter within a range of error. Confidence Interval – a range of possible values of a parameter expressed with a specific degree of confidence ( draw only one sample and compute point estimate) o With this, we take a point estimate and couple it with a knowledge about sampling distribution o The objective is to estimate a population parameter within a specific span or “interval” of values o Frequently used in exploratory studies o The level of confidence – a calculated degree of confidence that a statistical procedure conducted with sample data will produce a correct result for the sampled population (success rate)
Confidence Interval if a Population Mean Sample statistics are the tools to answer what is the value of UX? The level of expected error – the difference between the stated level of confidence and “perfect confidence” of 100 percent. Calculation: the level of confidence and the level of significance o a symbolize the level of expected error/ level of significance o Level of confidence = 100% - a o a = 100% - level of confidence Calculation: the standard error for a confidence interval of population mean o Because the parameter is unknown o S(x-bar) = Sx/Sqrt of N S(x-bar) = estimated standard error of means for an interval/ratio variable X SX = standard deviation of a sample N = sample size Choosing the critical Z-score Za o Confidence intervals are traditionally stated for 95% or 99% confidence o 95% will fall within 1.96 standard errors, thus 0.0250 (2.5%) of cases fall outside of this score +/- # in this case 1.96 is referred to as the critical Z-score for the 95% of confidence o 99% level of confidence, level of significance is 1% (.01). .005 falls out at each end, critical z-score is +/- 2.58
Calculating the Error Term Error term = Z(a)(Sxbar)
Calculating the Confidence Interval Confidence interval for a population mean is a sample mean plus and minus an error term. Formula:
o Used when the research question calls for estimating a population parameter o When the variable is of interval/ratio level of measurement. o When we are working with a single representative sample from one population o When the same size is greater than 121 The Five Steps for Computing a Confidence interval of a population mean, Ux 1) State the research question. Jot down the givens, including the population and sample under study, the variable (x), its level of measurement and given or calculated statistics 2) Compute the standard error and error term 3) Compute the LCL and UCL of the confidence interval 4) Provide an interpretation in everyday language 5) Provide a statistical interpretation illustrating the notion of “confidence in the procedure” See Chart in book and draw it
Proper Interpretation of Confidence Intervals
Statistical interpretation – expressing confidence by stating percentage. Ie. I am 95% confident o Correct 95% of the time, with 5% error o 95 times out of 100 the confidence interval falling within the interval. Within 1.96 standard errors
Common Misinterpretations of Confidence Intervals CI address size of parameters, not individual scores “95% sure that the mean of …. And not stating that 95% of these people..” It issues a SUMMARY statistic not individual Chosen Level of Confidence and the Precision of the Confidence Interval The Z-scores measures how far off a sample mean is from the true population mean. With the help of the normal distribution probability table, these scores determine the probability of occurrence of sampling outcomes Once a sample has been drawn, its mean, standard deviation and sample size are “givens” The greater the chosen level of confidence, the larger the Za. A larger Za = larger error term and less precise CI The relationship between the level of confidence and the degree of precision: o The greater the stated level of confidence, the greater the error term and therefore the less precise the confidence interval. Sample Size and the Precision of the Confidence Interval To obtain a high degree of precision and maintain a high level of confidence: Make sure before collecting data that the sample size is sufficiently large to produce small standard errors and precise confidence intervals. The relationship between sample size and degree of precision o The larger the sample size, the more precise the confidence interval. Large-Sample Confidence Interval of a Population Proportion Nominal and Ordinal Variables o CI provides a estimate of the proportion of a population that falls in the “success” category of the variable. o Calculation for confidence interval of a population proportion 1) Provide an interval estimate of the value of population parameter, Pu, where P=p[of the success category] 2) We have a single representative sample from one population 3) The sample size(n) is sufficiently large that (p smaller)(n)is greater or equal to 5, resulting in a sampling distribution that is normal (only restriction) o Then, Compute an estimated standard error and calculate the error term. (See page 253/254 for formula and examples)
Choosing a Sample Size for Polls, Surveys, and Research Studies Sample size for a confidence interval of a population proportion o A large sample size is better because it will produce small standard error. (denominator) o The chosen precisions (plus or minus 1 percet error ..) hinges on the size of the error term of the confidence interval equation o To figure out how many sample sizes you need for a certain percent error see page 257
Statistical Follies and Fallacies: It is plus and minus the error term A common mistake is the treat the error term as equal to the width of the confidence interval itself – the two confidence interval overlapped (when minus of the upper hand and plus of the lower hand overlaps) it is insufficient conclusion to say one has a greater chance
Chapter 9: All, but don't worry too much about "type 1" versus "type 2" errors. It is more important to get the main ideas behind hypotheses tests, p-values, critical regions, etc.