Using Probability Theory to Produce Sampling Distributions

Report 2 Downloads 185 Views
Soc202 Readings notes 2 Chapter 7 - Using Probability Theory to Produce Sampling Distributions (Focus on pages 206 - 215 (up to "Nominal Variables"); pages 219 – 222) Point Estimates  Sampling error – the difference between the calculated value of a sample statistic and the true value of a population parameter (unknown)  Point estimate – a statistic provided without indicating a range of error. (mean of a population) o There’s a variability in statistical outcomes from sample to sample Predicting Sampling Error  Repeated sampling – drawing a sample and computing its statistics and then drawing a second sample, a third, a fourth, and so on. (learned that sampling is only a estimate)  Symbols to represent population parameter for interval/ratio variables o UX = the mean of a population o OX = the standard deviation of a population  Sampling error = Xbar – UX  Sampling error is patterned and systematic and therefore is predictable o 1) The resulting sample means were similar in value and tended to cluster around a particular value. o 2) sampling variability was mathematically predictable form probability curves Sampling distribution  From repeated sampling, a mathematical description of all possible sampling outcomes and the probability of each one  A raw score distribution is scores of each person in your sample Sampling distribution for interval/ratio variables  Example: Physicians o 48 years = mean of sample means o UX = mean of a sampling distribution of means, will always equal to population mean o As with any normal curve, standard deviation is the distance to the point of inflection of the curve o Sampling distribution of means (X bar) describes all possible sampling outcomes and the probability of each outcome. Standard Error  Standard error – the standard deviation of a sampling distribution. The standard error measures the spread of sampling error that occurs when a population is sampled repeatedly o Measures the spread o Equation – the standard error of a sampling distribution of means is the samples standard deviation divided by the square root of the sample size (S(xbar) = sx/square root of N)

Law of large numbers  The larger the samples size, the smaller the standard error  Replacing n with higher number, sample error decreases The central limit Theorem  Regardless of the shape of a raw score distribution of an interval/ratio variable, its sampling distribution will be normal when the sample size, n, is greater than 121 cases and will center on the true population mans.  Sampling distribution has a small range than the raw score distribution (not a normal curve)  Distributions are rectangular in shape when the raw scores have equal chances of getting picked Bean Counting as a Way of Grasping the Statistical Imagination  Sampling distribution is a probability distribution, it tells us how frequently to expect any and all sampling out comes when we draw random samples  The central limit theorem essentially states that random sampling results in normal curves: symmetrical distributions that bunch in the middle and tail out to the sides  As long as sample size is sufficiently large, the sample distribution of proportions takes the bell shape of a normal curve Distinguishing Among Populations, Samples, and Sampling Distributions  Sample = Statistics, Population = Parameter, Sampling distribution = Hypothetical distribution of an infinite number of samples of size N. Statistical Follies and Fallacies: Treating a point estimate as though it were absolutely true  No single statistics is the last word on estimating a parameter of the population

Chapter 8: Parameter Estimation Using Confidence Intervals Focus on pages 237 - 256 (up to "choosing sample size"); also, skip "confidence intervals of the mean for small samples" paragraph on page 251   

The statistics of a sample are estimates. In this chapter we learn to say confidently just how close this single point estimate is to the true parameter within a range of error. Confidence Interval – a range of possible values of a parameter expressed with a specific degree of confidence ( draw only one sample and compute point estimate) o With this, we take a point estimate and couple it with a knowledge about sampling distribution o The objective is to estimate a population parameter within a specific span or “interval” of values o Frequently used in exploratory studies o The level of confidence – a calculated degree of confidence that a statistical procedure conducted with sample data will produce a correct result for the sampled population (success rate)

Confidence Interval if a Population Mean  Sample statistics are the tools to answer what is the value of UX?  The level of expected error – the difference between the stated level of confidence and “perfect confidence” of 100 percent.  Calculation: the level of confidence and the level of significance o a symbolize the level of expected error/ level of significance o Level of confidence = 100% - a o a = 100% - level of confidence  Calculation: the standard error for a confidence interval of population mean o Because the parameter is unknown o S(x-bar) = Sx/Sqrt of N  S(x-bar) = estimated standard error of means for an interval/ratio variable X  SX = standard deviation of a sample  N = sample size  Choosing the critical Z-score Za o Confidence intervals are traditionally stated for 95% or 99% confidence o 95% will fall within 1.96 standard errors, thus 0.0250 (2.5%) of cases fall outside of this score  +/- # in this case 1.96 is referred to as the critical Z-score for the 95% of confidence o 99% level of confidence, level of significance is 1% (.01). .005 falls out at each end, critical z-score is +/- 2.58

Calculating the Error Term  Error term = Z(a)(Sxbar)

Calculating the Confidence Interval  Confidence interval for a population mean is a sample mean plus and minus an error term.  Formula:

o Used when the research question calls for estimating a population parameter o When the variable is of interval/ratio level of measurement. o When we are working with a single representative sample from one population o When the same size is greater than 121 The Five Steps for Computing a Confidence interval of a population mean, Ux  1) State the research question. Jot down the givens, including the population and sample under study, the variable (x), its level of measurement and given or calculated statistics  2) Compute the standard error and error term  3) Compute the LCL and UCL of the confidence interval  4) Provide an interpretation in everyday language  5) Provide a statistical interpretation illustrating the notion of “confidence in the procedure”  See Chart in book and draw it

Proper Interpretation of Confidence Intervals



Statistical interpretation – expressing confidence by stating percentage. Ie. I am 95% confident o Correct 95% of the time, with 5% error o 95 times out of 100 the confidence interval falling within the interval. Within 1.96 standard errors

Common Misinterpretations of Confidence Intervals  CI address size of parameters, not individual scores  “95% sure that the mean of …. And not stating that 95% of these people..”  It issues a SUMMARY statistic not individual Chosen Level of Confidence and the Precision of the Confidence Interval  The Z-scores measures how far off a sample mean is from the true population mean. With the help of the normal distribution probability table, these scores determine the probability of occurrence of sampling outcomes  Once a sample has been drawn, its mean, standard deviation and sample size are “givens”  The greater the chosen level of confidence, the larger the Za. A larger Za = larger error term and less precise CI  The relationship between the level of confidence and the degree of precision: o The greater the stated level of confidence, the greater the error term and therefore the less precise the confidence interval. Sample Size and the Precision of the Confidence Interval  To obtain a high degree of precision and maintain a high level of confidence: Make sure before collecting data that the sample size is sufficiently large to produce small standard errors and precise confidence intervals.  The relationship between sample size and degree of precision o The larger the sample size, the more precise the confidence interval. Large-Sample Confidence Interval of a Population Proportion  Nominal and Ordinal Variables o CI provides a estimate of the proportion of a population that falls in the “success” category of the variable. o Calculation for confidence interval of a population proportion  1) Provide an interval estimate of the value of population parameter, Pu, where P=p[of the success category]  2) We have a single representative sample from one population  3) The sample size(n) is sufficiently large that (p smaller)(n)is greater or equal to 5, resulting in a sampling distribution that is normal (only restriction) o Then, Compute an estimated standard error and calculate the error term. (See page 253/254 for formula and examples)

Choosing a Sample Size for Polls, Surveys, and Research Studies  Sample size for a confidence interval of a population proportion o A large sample size is better because it will produce small standard error. (denominator) o The chosen precisions (plus or minus 1 percet error ..) hinges on the size of the error term of the confidence interval equation o To figure out how many sample sizes you need for a certain percent error see page 257

Statistical Follies and Fallacies: It is plus and minus the error term  A common mistake is the treat the error term as equal to the width of the confidence interval itself – the two confidence interval overlapped (when minus of the upper hand and plus of the lower hand overlaps) it is insufficient conclusion to say one has a greater chance

Chapter 9: All, but don't worry too much about "type 1" versus "type 2" errors. It is more important to get the main ideas behind hypotheses tests, p-values, critical regions, etc.