COMM121: BASIC BUSINESS STATISTICS SUMMARY CHAPTER 1: INTRODUCTION AND DATA COLLECTION Variable Data Population Sample Parameter Statistic
Characteristics of items or individuals Observed values of variables Consists of all the members of a group about which you want to draw a conclusion, e.g. 18 year olds A portion of the population selected, e.g. a hundred 18 year olds A numerical measure that describes a characteristic of a population A numerical measure that describes a characteristic of a sample
Descriptive statistics – data based on a population (whole group) Inferential data – conclusion based on sample data (about a population) Identifying sources of data 4 important sources of data – data distributed by an organisation/individual designed experiment survey observation Primary source – provide info collected by data analyser Secondary source – provide data collected by another person/organisation Types of variables Numerical Digits or numbers are used as values Discrete Whole number data e.g. people in class
Categorical Data that can be sorted by category/sort
Continuous Can have decimals, fractions e.g. weight
Categorical Data
Discrete Numerical Continuous
Levels of measurement and types of measurement scales
Nominal data Categorical data
Ordinal data
Nominal data Only sorted into categories i.e. can’t tell if one is better or worse e.g. gender, hair colour Ordinal data Categories that have a ranking e.g. school grades Interval data E.g. temperature, year
Interval data Nominal data
Ratio data
Ratio data Where 0 can exist e.g. age, weight
CHAPTER 2: PRESENTING DATA IN TABLES AND CHARTS CATEGORICAL DATA
Categorical data
Example of a pie and bar chart:
Graphing data Summary table
Pie charts Bar charts
NUMERICAL DATA
Histogram Frequency distributions, cumulative distributions
Polygon
Numerical data
Ogive
Ordered array Stem and leaf
Ordered array – a sequence of data ranked in order, e.g. 1, 2, 3, 4, 5 Stem and leaf – quick way to see distribution details in a data set
Frequency distribution – summary table in which data is arranged into numerically ordered classes or intervals Advantages; good to summarise, condenses data into a more useful form, visual representation
Steps for frequency distribution; 1) 2) 3) 4) 5) 6) 7)
Sort data into ascending order Find range (largest – smallest) Select number of class (the number of numerical categories) Compute class interval (width) = range / number of classes Set limits e.g. 0, 10, 20, 30, 40, 50 Set class midpoints e.g. 5, 15, 25, 35, 45 Make a table as exampled;
Class
Frequency
%
Cumulative frequency
Cumulative %
0 < 10 11 < 20 21 < 30 31 < 40 41 < 50
6 2 6 4 0
30 10 30 20 0
6 8 14 18 18
30 40 70 90 90
Class – class boundaries (limits) Frequency – number of numbers in the class Percentage - % of frequency against 100% Cumulative frequency – frequency + previous frequency Cumulative % - % of cumulative frequency
Class intervals and class boundaries Each data value belongs to only one class Each class grouping has the same width Range = Largest value – smallest value Width of interval = Range / Number of desired class groupings
Histogram – Graph of the data in a frequency distribution Difference between bar graph; no gap between bars, there’s classes
Frequency polygon – the midpoint of the bars on a histogram
The Ogive – table/graph for cumulative data
Scatter diagrams – used to examine possible relationships between 2 numerical variables
Time-series plot – used to study patterns or changes over time
CHAPTER 3: NUMERICAL DESCRIPTIVE MEASURES
Describing data Central tendency
Quartiles
Variation
Arithmetic mean
Range
Median
Interquartile range
Mode
Variance
Geometric mean
Standard variance
Shape
Skewness
Coefficient of variation
Arithmetic mean – another word for average/mean, is affected by extreme values or OUTLIERS Median – the actual “middle” number, e.g. 1 2 3 4 5 Median formula: (n + 1)/2 = median position/rank (Note: if sample size is even, the median is the average of the 2 middle values) Mode – the number of the highest tendency/frequency
Quartiles – splits the data into 4 segments with equal number of values per segment Q1 Q2 (median) Q3
(n + 1)/4 = first quartile RANK (n + 1)/2 3(n + 1)/4