WEEK 1: DESCRIPTIVE STATISTICS WHAT IS STATISTICS? -
Statistic: Summary of data (which are measures of events) Field of statistics: The collecting, analyzing and understanding of data measured with uncertainty
GRAPHS QUANTITATIVE OR CATEGORICAL -
A Categorical variable places an individual into one of several categories A Quantitative variable take numerical values, measured on a scale
RECOMMENDED GRAPHICAL TOOLS If you want to summarize one variable -
and it is quantitative: a histogram or boxplot and it is categorical: a bar graph (or “bar chart”)
If you want to summarize two variables -
and both are quantitative: a scatterplot and both are categorical: a clustered bar chart or a jittered scatterplot and one is categorical, and the other quantitative: comparative boxplots or comparative histograms
WHAT TO LOOK FOR IN A GRAPH? Consider: -
The location (where most of the data are) and spread (or variability) of the data The shape of the data (symmetric, left-skewed or right-skewed) Any unusual observations
OTHER GRAPHICAL TOOLS -
Stem-and-lead plots Pie Charts Time plots Dot plots
2
KNGX NOTES MATH1041
SUMMARIES OF QUANTITATIVE VARIABLES TYPES OF NUMERICAL SUMMARIES Examples of numerical summaries are -
Proportions or percentages Mean or average Median Interquartile Range (IQR) Standard Deviation
RECOMMENDED NUMERICAL SUMMARIES If you want to summarize one categorical variable: -
Table of frequencies or percentages
If you want to summarize one quantitative variable: Measures of
Location
Spread
Commonly Used
Mean (x)
Standard Deviation (s)
Median (M)
Interquartile Range (IQR)
Robust to outliers
MEASURES OF LOCATION MEAN -
The mean is just another name for what is commonly called the average of a set of numbers A common notation for the mean is:
-
The mean has a physical interpretation as the center of gravity of the data
.
MEDIAN -
The median is the “middle value” For n values sorted as x1, x2,…,xn X(n+1)/2 if n is odd (middle number) The average of xn/2 and xn/2+1if n is even (average of the two middle numbers)
BOXPLOTS -
The horizontal lines in the middle correspond to the median 2
3
KNGX NOTES MATH1041 -
The edges of the box correspond to the medians of the lower and upper half of the data: Q1 = First Quartile Q2 = Third Quartile
MEASURES OF SPREAD Interquartile Range (IQR): A simple measure of spread is the height of the box part of the boxplot
Q3 – Q1 = interquartile range = IQR Standard Deviation (s) is another measure of spread
Note: often IQR and s are similar, although the standard deviation can be heavily influenced by outliers
RECOMMENDED NUMERICAL SUMMARIES If you want to summarize one categorical variable: Table of frequencies or percentages If you want to summarize on quantitative variable
Where Min and Max are the smallest and largest values
BOXPLOT TERMINOLOGY -
Boxplot: stems go from the box to the minimum and maximum (visual representation of five-number summary) Modified boxplot: stems use 1.5 x IQR criterion for outliers
OUTLIER IDENTIFICATION There is no clear cut but the textbook recommends the 1.5x IQR criterion Observation is a suspected outlier More than 1.5x IQR lower than Q1 or More than 1.5 x IQR higher than Q3
LINEAR TRANSFORMATION A linear transformation is the transformation of a variable from x to x new as follows:
Where x is some quantitative measurement
4
5
KNGX NOTES MATH1041
HOW LINEAR TRANSFORMATIONS AFFECT CENTRAL LOCATION Recall that the measures of central location are:
HOW LINEAR TRANSFORMATION AFFECTS SPREAD Recall that the measures of spread are: