Data:!Qualitative)data!is!numeric!data.!Qualitative)(categorical))data!is! non/numeric.!Discrete)data!has!a!limited!number!of!values.!Continuous) data!can!have!an!infinite!number!of!values.!Once!rounded!it!becomes! discrete.%Identifiers!used!for!identification!(not!quantitative).% Surveys:%Sample:)survey!subset!of!pop!to!generalise!about!pop.!Census:! information!about!whole!pop;!more!accurate,!but,!expensive,!time! consuming!and!may!destroy!pop.!Parameter)value!for!pop.!Statistic:)data! from!sample.!! How)to)sample:)Simple)random)sample:)so!representative!of!pop,!every! indiv!has!equal!chance!of!being!selected.!Stratified:)Divide!pop!into!strata! and!take!SRS!from!each!(same!within,!but!different!between).!Cluster:) Choose!clusters!at!random!and!take!a!census!in!them!(same!between,! different!within).!Systematic:)Use!fixed!interval!to!select!members!from! sampling!frame.!Multistage:)Combine!several!methods!like!SRS/stratified.!! Errors:)Sampling)error)(sampling!variability):!The!difference!between! statistics!and!parameters!due!to!random!process.!Can!reduce!sample!size.! Bias)(nonCsampling)error):)Regardless!of!sample!size!from!error!in!sampling! methodology.!Causes!LR!average!to!be!wrong!!From!under!coverage,! convenience!sampling,!voluntary!response!bias,!non/response!bias!or! response!bias.!IF#TRULY#RANDOM,#NO#BIAS!! Categorical%data:!Make)a)picture!)Frequency!table,!contingency!table! (shows!joint!frequencies),!relative!frequency!tables!(shows!%),!conditional!%! tables!(columns!are!conditional!on!what!others!say!i.e.!%!of!total).!Use!a!bar! or!pie!chart.!Simpson’s)paradox:)Conclusion!drawn!about!total!doesn’t! match!conclusion!about!groups!within.!From!inappropriately!combining! percentages!of!diff!groups,!not!using!comparable!measurements!for! comparable!individuals,!or!the!observed!relationship!between!two!variables! may!be!affected!by!other!ignored!variables,!like!difficulty.!If#two#categorical# variables#are#related,#say#association,#not#correlation#or#causation!#! Quantitative%data:%Picture:)histogram!(bars!are!vertical)!or!stem/and/leaf.! Describe:)Shape:)Unimodal,!bimodal,!multimodal!or!uniform?!Symmetric,!or! skewed!(negative/left!with!tail!to!left)!If!negatively!skewed!mean!is!less!than! median.!Any!outliers!or!gaps?!Centre:)Mean!(simple!average)!but!affected! by!outliers,!or!the!median!(middle!data!value)!not!affected!by!outliers,!or! mode!(value(s)!that!occurs!most,!for!categorical).!Spread:)Range!(max!–! min),!IQR!(Q3!–!Q1!to!show!middle!50%).!Variance:)Shows!how!each!value! varies!from!the!mean!in!squared!units.!The!standard)deviation)is!in!same! ∑(! ! !(∑!)! /!
units!data.!! ! = => !"#$%#&%!!"#$%&$'(!(!) = ! √! ! !!(it#is#s,# !!! not#±#s)! Box)Plots:)Use!a!5!no.!sum!(Min,!Q1,!Median,!Q3,!Max,!n).!Fences.!First!1.5!x! IQR!then!this!Q1!–!this!and!Q3!+!this.!Number!in!data!closest!to!this!(but!not! beyond)!is!fence.!Anything!beyond,!mark!with!a!dot!(or!a!separate!symbol).!! !!!"#$ Standardizing:!! = .!Does!not!change!the!shape,!shifts!the!mean!to!0! ! and!rescales!the!SD!to!1.!A!negative!z/score!shows!value!is!below!the!mean.! Shows!how!unusual!data!value!is!as!shows!how!many!standard!deviations!it! is!from!mean.!! Correlation%and%regression:%Scatterplots:)Only!for!two!quantitative! variables!!When!commenting,!mention!direction,!form!(if!linear),!Strength! and!outliers.!E.g.#moderate,#positive,#linear#scatter+#no#outlier! Correlation)Coefficient)(r):.!e.g.!positive,!moderate!linear!relationship.!! Linear)model:)The!line!of!best!fit,!or!regression,!or!ordinary!least!squares,!or! simple!linear!regression.!! Equation:)!!=!intercept!+!slope*X.)Interpret)equation:)This!shows!the! estimated!average!linear!relationship!between![variables].!Intercept) interpret:)We!estimate!that!when!explanatory!variable!is!0,!the!response! variable!will!on!average!=!_.!Slope:)We!estimate!that!when![X]!increases!by! 1!unit,!the![Y]!will!increase!by!_!on!average.!!! Conditions)to)checkCAlways'do'first!:)Quantitative!data,!linear,! independence!of!the!residuals!(can’t!relate!to!one!another)!and!equal! spread!of!the!residuals!(around!the!regression!line).! Residual:)=!Yvalue!–!the!line!of!best!fit!point.!Or!=!Actu/Predict.!Variability:) The!SD!of!the!residuals!(standard!error)!is!estimated!by:! ∑ !!!"#$%!'( !!!
!
!!Compare!SD!as!SE!measures!variability!of!Y!values!
around!the!reg!line.!Residual/SE!=!how!far!point!from!line!standardized.!If! point!is!below!2!SEs!not!an!outlier.!The!residual!distance!(SD)!should!be! greater!than!the!regression!(SE)!as!squaring!distance!from!the!mean!rather! than!reg!line.!! 2 R )(Coefficient)of)Determination):)Interpret:!the![%]!of!the!total!variation!in! [Yvariable]!can!be!explained!by!the!linear!relation!on!the![Xvariable].! Probability:%Disjoint)(mutually)exclusive):)two!events,!if!one!occurs!the! other!can’t.!Exhaustive)events:)cover!all!outcomes.!Independent:)One!
event!occurs!without!changing!probability!of!another!occurring.!Cannot!be! both!disjoint!and!independent!!Law)of)large)numbers:)If!the!events!are! independent,!then!the!long/run!frequency!of!an!event!gets!closer!to!a!single! value!as!the!number!of!trials!increase.!Law)of)Averages!is!wrong!!Probability! does!not!change!due!to!past!events!!! 3)methods)of)evaluating)probabilities:!Empirical!(long/run!relative! frequency).!Theoretical!(P(A)!=!#outcomes!in!A/total!#!of!outcomes).! Personal!(express!own!subjective!uncertainty).!Rules:)Probabilities!must!lie! between!0!and!1.!If!it!is!all!possible!outcomes!then!it!must!=!1.!Be#wary#not# to#double#count!## />!P(A)!=!1/P(Event!not!A)! />!P(A!or!B)!=!P(A)!+!P(B)!provided!disjoint!! />!P(A!or!B)!=!P(A)!+!P(B)!–!P(A!and!B)! />!P(A|B)!=!P(A!and!B)/P(B)! />!P(A!and!B)!=!P(A)!x!P(B|A)# Independence:!P(A!and!B)=P(A)!x!P(B)!or!P(A!and!B)/P(B)=P(A)!(If#disjoint#P(A# and#B)#=#0).#! Random%Variables%and%Probability%Models:%Expected)value:)Long!run! average,!showing!the!weighted!average!of!outcomes.!µ!=!∑[!X]!=!∑(X!×!P(X)).! Variance:)V(X)!=!E[! ! ]!/!E[!]! (SQUARED!UNITS).!! Standard)deviation:)SD(X)!=!!√! ! !(NORMAL!UNITS).!! Expected)variable)rules:)) 1.!E[c]=c!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!4.!E[X+c]!=!E[X]+c! 2.!E[cX]!=!cE[X]!!!!!!!!!!!!!!!!!!!!!5.!E[X/Y]!=!E[X]/E[Y]! 3.!E[X+Y]!=!E[X]+E[Y]! Variance)rules:)) 1.!V[c]=0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!4.!V[X+Y]!=!V[X]+V[Y]!(if!independent)! 2.!V[cX]!=!! ! V[X]!!!!!!!!!!!!!!!!!5.!V[X/Y]!=!V[X]+V[Y]!(if!independent)! 3.!V[X+c]!=!V[X]! 3#is#intuitive#as#adding#a#no#to#a#dataset#does#not#change#the#variability,#it# just#shifts#them#up#or#down#the#number#line.#! The%Normal%Model:%The!Empirical!(68/95/99.7)!Rule:! Normal!models!are! appropriate!if! unimodal!(mound! shaped/cluster! around!middle),! symmetric! (mean=median=mod e).!Infinite!range!but! only!3/1000!values! are!beyond!3!SDs! from!mean.!! !!!!! Standardizing:)! = !(mean!becomes!0,!SD!becomes!1)! ! Sampling%Distributions%(inference):!For#a#proportion:!P)is)true) proportion)and)!is)sample)value,)will)vary)due)to)sampling)variability.) Describe:)Shape:)Normal)(if)conditions)hold).)Centre:)true)mean)is)P.) Spread:)variance:)pq/n.)SD)is) !"/!! conditions:)random,!10%!condition!(n/N!less!than!0.1!or!with!replacement)! and!success/failure!(np!and!nq!must!be!at!least!10!–!calculate!).! Standardizing:)! =
!!! !" !
!!
For'a'mean:)Describe:)Shape:)The!sampling!distribution!of!ӯ!is!normal!if! conditions!hold.)Centre:!The!mean!of!the!sample!means!is!always!the!
!!
population!mean!(µ).!But!the!sample!mean!varies.!Spread:)variance!is:! ! !
!
and!SD!is:! ! !
Conditions:)random,!10%!and!nearly!normal!condition!(Either,!the!original! population!y!is!normal,!or,!Central!limit!theorem:!The!sampling!distribution! of!any!mean!becomes!normal!as!the!sample!size!grows.!However,!if!very! skewed!then!will!require!hundreds.!If!below!15!then!should!follow!normal! model!closely.!If!between!15!and!40,!then!should!be!unimodal!and! symmetric.!If!over!40!then!ӯ!normally!distributed!unless!population! ӯ!!!!! extremely!skewed.!Standardizing:)! = ! !! !
Standard)Error:)If!use!estimated#standard#deviation#of#a#sampling# distribution!(this!is!sampling!variability).!For!a!proportion:!! = mean:!! =
! !
!! !
.!For!a!
!
Confidence%Intervals%for%proportions:!CI!for!P:!Point!estimate!±!margin! of!error.!Steps:)1.)PLAN:)Identify!parameter!and!%!of!confidence.!Check!the! conditions!(random,!10%!(or!replacement),!success/failure).!State!if!samp! distrib!is!normal.!!