WEEK 1: DESCRIPTIVE STATISTICS

Report 0 Downloads 157 Views
WEEK 1: DESCRIPTIVE STATISTICS WHAT IS STATISTICS? -

Statistic: Summary of data (which are measures of events) Field of statistics: The collecting, analyzing and understanding of data measured with uncertainty

GRAPHS QUANTITATIVE OR CATEGORICAL -

A Categorical variable places an individual into one of several categories A Quantitative variable take numerical values, measured on a scale

RECOMMENDED GRAPHICAL TOOLS If you want to summarize one variable -

and it is quantitative: a histogram or boxplot and it is categorical: a bar graph (or “bar chart”)

If you want to summarize two variables -

and both are quantitative: a scatterplot and both are categorical: a clustered bar chart or a jittered scatterplot and one is categorical, and the other quantitative: comparative boxplots or comparative histograms

WHAT TO LOOK FOR IN A GRAPH? Consider: -

The location (where most of the data are) and spread (or variability) of the data The shape of the data (symmetric, left-skewed or right-skewed) Any unusual observations

OTHER GRAPHICAL TOOLS -

Stem-and-lead plots Pie Charts Time plots Dot plots

2

KNGX NOTES MATH1041

SUMMARIES OF QUANTITATIVE VARIABLES TYPES OF NUMERICAL SUMMARIES Examples of numerical summaries are -

Proportions or percentages Mean or average Median Interquartile Range (IQR) Standard Deviation

RECOMMENDED NUMERICAL SUMMARIES If you want to summarize one categorical variable: -

Table of frequencies or percentages

If you want to summarize one quantitative variable: Measures of

Location

Spread

Commonly Used

Mean (x)

Standard Deviation (s)

Median (M)

Interquartile Range (IQR)

Robust to outliers

MEASURES OF LOCATION MEAN -

The mean is just another name for what is commonly called the average of a set of numbers A common notation for the mean is:

-

The mean has a physical interpretation as the center of gravity of the data

.

MEDIAN -

The median is the “middle value” For n values sorted as x1, x2,…,xn X(n+1)/2 if n is odd (middle number) The average of xn/2 and xn/2+1if n is even (average of the two middle numbers)

BOXPLOTS -

The horizontal lines in the middle correspond to the median 2

3

KNGX NOTES MATH1041 -

The edges of the box correspond to the medians of the lower and upper half of the data: Q1 = First Quartile Q2 = Third Quartile

MEASURES OF SPREAD Interquartile Range (IQR): A simple measure of spread is the height of the box part of the boxplot

Q3 – Q1 = interquartile range = IQR Standard Deviation (s) is another measure of spread

Note: often IQR and s are similar, although the standard deviation can be heavily influenced by outliers

RECOMMENDED NUMERICAL SUMMARIES If you want to summarize one categorical variable: Table of frequencies or percentages If you want to summarize on quantitative variable

3

4

KNGX NOTES MATH1041 Measures of:

Location

Spread

Commonly Used

Mean (x)

Standard Deviation (s)

Median (M)

Interquartile Range (IQR)

Robust to Outliers

FIVE-NUMBER SUMMARIES Textbook advocates the five-number summary: 1. 2. 3. 4. 5.

Min Q1 M Q3 Max

Where Min and Max are the smallest and largest values

BOXPLOT TERMINOLOGY -

Boxplot: stems go from the box to the minimum and maximum (visual representation of five-number summary) Modified boxplot: stems use 1.5 x IQR criterion for outliers

OUTLIER IDENTIFICATION There is no clear cut but the textbook recommends the 1.5x IQR criterion Observation is a suspected outlier More than 1.5x IQR lower than Q1 or More than 1.5 x IQR higher than Q3

LINEAR TRANSFORMATION A linear transformation is the transformation of a variable from x to x new as follows:

Where x is some quantitative measurement

4

5

KNGX NOTES MATH1041

HOW LINEAR TRANSFORMATIONS AFFECT CENTRAL LOCATION Recall that the measures of central location are:

HOW LINEAR TRANSFORMATION AFFECTS SPREAD Recall that the measures of spread are:

5