Chapter 5 Modeling Variation with Probability Why Randomness ...

Report 2 Downloads 49 Views
Chapter 5 Modeling Variation with Probability Why Randomness  make sure the collected samples represent the whole population Randomness: no predictable pattern occurs e.g. dice, heads or tails

Probability: theoretical and empirical Theoretical probability: long run relative frequencies based on theory e.g. flipped coin: eventually the probability will be 50% and 50% - Probability will be in the between of 0 and 1 - Complement: that an event doesn’t happened – represent other events - Equally likely outcomes: each event is equally to occur e.g. each number on dice will have an equal chance to occur

Probability rules: 1. And: the probability of both events occur 2. Or: the probability of at least one of the events must occur 3. Exclusive event: the probability of both events occur is 0 Conditional probability: Event A and B are associated Independent events: event A and B are independent-> B doesn’t influence A Empirical probability: short run relative frequencies - Laws of large numbers: large numbers of trials -> empirical approach to theoretical probability Sample space: a list that contains all possible( and equal likely) outcomes is called the sample space Cha 6 Modeling Random -

Random variables can be discrete ( no decimal, can be listed or counted )or continuous (over a range)

-

Use a probability model to predict the likely hood of event  Normal model  Binomial model

Chap 7 Sampling Sample: make decision toward to the population - Samples have to be: 1. part of the whole population 2. randomize: make sure on the average- make sure represent the whole population, avoid bias) 3. sample size: makes difference in sampling Population : a group of objects being studied Parameter: numerical value that characterize some aspents of the population e.g. probability Census: a survey in which every member of the population is measure  may be too expensive  takes time  destructive ( in order to destroy sth to get the data) nature:  ask someone to drink 100 cans of beer  kill all the animals, etc. Statistical inference: drawing conclusions about a population on the basis of observing only a small subset of the populations  involve uncertainly( use terms like: predict, maybe)

Sampling bias: occur becoz the sample doesn’t represent of the sample 1. voluntary-response bias  online survey: only have the strong feeling will fill the survey  only provide the info of the people doing this survey  not scientific pool 2. non- response bias: people fail to answer a question of respond to a survey  provide wrong answers, people, with no matter what reasons, choose to provide wrong answers or no response

Measurement bias: asking questions that do not produce a true answer - people may over estimate or underestimate - questions ( too much info) that guide people to answer the Q  has to be aware of the phrasing of Q too. - double barreled Qs: e.g. are you satisfied with UBC and ur faculty? How to know there is bias? - Only Small group of samples to reply the survey- bias - Whether the researchers choose the participates -

Whether the researchers leave our the feedback which is totally different from the rest of the population.

Problems with survey 1. Convenience sampling e.g. standing at ubc bus loop asking students who think upass is a good policy -> forgot the include the people who drive 2. undercoverage E.g telecalling during only specific time -> result only a small representation in the sample that it has in the population 3. Boring survey

Simple random sampling ( SRS) - Population should be at least x10 the sample size - Minimize the bias 1st step  first define where the sample will come form - Sampling frame

Systematic sampling: Collect data on every nth individual -

E.g. student number with the last number is 9 do the survey produces a random sample if done correctly

> include representative form all times and location and make sure population doesn’t have cycle

Stratified sampling -

Individuals are about homogeneous, strata is different from one another) First sliced into homogeneous groups( strata) before the sample is selected

e.g. group Lfs students according to specializations  Within each strata then conduct a systematic random sample

Cluster Sampling -

a cluster: individuals are different/ clusters about alike The method which divides the population into distinct groups and then look at every member of the group using some sort of nature or convenient distinction Split the population into similar groups ( smaller groups) Then we select one or a few clusters at random

Accurate/ precise - Sampling distribution---- the word describe SD will be Standard Error - Precision can be improved by using larger sample size -

Population size has no bias, no influence on precision ( standard error) But the sample sizes increase, the SE will decrease, mean still same No bias: mean proportion of a sampling distribution equals to population proportion

Simple random sampling distribution - The true measurement stays the same - But! The statistics the samples we value that going to estimate the population parameter changes from sample to sample statistic will differ  We don’t do it in real world, becoz the sample will barely sit in the extreme high or low probability Different between Sample Distribution & Sampling Distribution -

Sample distribution: take 1 sample and plot a histogram or bar chart of all observations Sampling distribution: take many samples from population, and calculate a statistic for each samples/ plot frequency or probability.

Sampling distribution Central limit theorem: 1. Has to be random and independent sample 2. Large sample: the sample has at least 10 successes, np>=10 and at least 10

failures n(1-p)>=10 3. Large population: if the sample is collected without replacement, then the population size is at least 10 times the sample size Confidence intervals 1. Use a confidence interval to get plasusible bounds on a population proportion 2. Do not use if its not a large sample: np