Chapter 2: Descriptive Statistics Describing the Shape of a Distribution ...

Report 2 Downloads 75 Views
Chapter 2: Descriptive Statistics

Describing the Shape of a Distribution 

Descriptive statistics: The science of describing the important characteristics of a population or sample o

Central tendency: Middle off the data set

o

Variability: Spread of the data

o

Shape: Distribution of the data set over various values

o

Outliers: An unusually large or small data that is far off from the rest of the data set



Graphical methods: Methods of depicting data sets to study relationships between different variables



Stem-and-leaf display (Pg 26): Displays an overall pattern in the data, by group it into classes o

Shows the variation from class to class, and the amount, and distribution of data in each class

o

Best for small to moderately sized data distributions

o

Steps to creating a stem-and-leaf display: 1.

Decide which unit will be used for the stems and the leaves. Choose units for the stems so that there will be somewhere between 5 and 20 stems.

2.

Place the stems in a column with the smallest stem at the top column and the largest at the bottom.

3.

Enter the leaf for each measurement into the row corresponding to the proper stem. The leaves should be single digit numbers (rounded values if originally more than one).

4. 

Rearrange the leaves so that they are in increasing order from left to right.

Frequency distribution (pg 27 – 32): A table that groups data into particular classes defined by a stem o

Frequency: The number of a class defined by a stem

o

Histogram: A graphical portrayal of a data set that shows the data set’s distribution

o

Steps to creating a histogram: 1.

Find the number of classes. 

Number of classes should be the smallest whole number ‘k’ that makes the quantity 2k greater than the number of measurements.

2.

Find the class length. 

3.

4.

Form non-overlapping classes of equal width. 

Lower boundary of the 1st class: smallest data value



Lower boundary of 2nd+ classes: upper boundary of the last class



Upper boundary of any classes: lower boundary of class + class length



The last class may be an open class, with no upper boundary.

Tally and count the number of measurements in each class. 

Frequency: The number of measurements in each class



Relative frequency (percent): Proportion of the total number of measurements in the class



Relative frequency distribution: List of all data classes and their relative frequencies

5.





Graph the histogram 

Plot each (relative) frequency as the height of rectangle positioned over corresponding class.



The x-axis can consist of upper and lower class boundaries, or class midpoints.



Use the class boundaries to separate adjacent rectangles.

Skewness o

Normally distributed: Symmetrical bell-shaped normal curve

o

Positively skewed: With a tail to the right

o

Negatively skewed: With a tail to the left

Dot Plots (pg 33 – 34): A number line with each data value represented above the corresponding scale value o

Useful for detecting outliers (along with stem and leaf displays)

Describing Central Tendency 







Population parameter: o

A constant value calculated from all the population measurements that describes an aspect of the population

o

Central tendency: The center, or middle, of the data set

o

Point estimate: One-number estimate of the value of a population parameter

Sample statistics: Number calculated using the sample measurements that describes some aspect of the sample. o

Since measuring all population units is difficult, samples and estimates are used.

o

A descriptive measure of the sample.

Population mean (μ): Average of the population measurements o

Calculated by adding all the population measurements, and dividing the sum by the number of measurements

o

Constant value

Sample mean (x-bar): Average of the sample measurements ∑

o o 





(where n = sample size, x = sample measurements)

Is the point estimate of the population mean, and is a random variable

Median (Md): Measurement that divides a population or sample into roughly equal parts. o

Arrange the measurements of a population or sample in increasing order

o

If the number of measurements is odd, median is the middle measurement in the ordering

o

If the number of measurements is even, median is the average of the two middle measurements in the ordering

o

More resistant to outliers, and is therefore a better choice of measuring centrality

Mode (Mo): Measurement that occurs most frequently in a population or sample o

Bimodal: Exactly two modes

o

Multimodal: More than two modes

Compilation: o

When the curve is bell-shaped:

mean = median =

mode

o

When the curve is right skewed:

mean > median >

mode

o

When the curve is left skewed:

mean < median