COMM121: BASIC BUSINESS STATISTICS SUMMARY

Report 5 Downloads 99 Views
COMM121: BASIC BUSINESS STATISTICS SUMMARY CHAPTER 1: INTRODUCTION AND DATA COLLECTION Variable Data Population Sample Parameter Statistic

Characteristics of items or individuals Observed values of variables Consists of all the members of a group about which you want to draw a conclusion, e.g. 18 year olds A portion of the population selected, e.g. a hundred 18 year olds A numerical measure that describes a characteristic of a population A numerical measure that describes a characteristic of a sample

Descriptive statistics – data based on a population (whole group) Inferential data – conclusion based on sample data (about a population) Identifying sources of data 4 important sources of data –  data distributed by an organisation/individual  designed experiment  survey  observation Primary source – provide info collected by data analyser Secondary source – provide data collected by another person/organisation Types of variables Numerical Digits or numbers are used as values Discrete Whole number data e.g. people in class

Categorical Data that can be sorted by category/sort

Continuous Can have decimals, fractions e.g. weight

Categorical Data

Discrete Numerical Continuous

Levels of measurement and types of measurement scales

Nominal data Categorical data

Ordinal data

Nominal data Only sorted into categories i.e. can’t tell if one is better or worse e.g. gender, hair colour Ordinal data Categories that have a ranking e.g. school grades Interval data E.g. temperature, year

Interval data Nominal data

Ratio data

Ratio data Where 0 can exist e.g. age, weight

CHAPTER 2: PRESENTING DATA IN TABLES AND CHARTS CATEGORICAL DATA

Categorical data

Example of a pie and bar chart:

Graphing data Summary table

Pie charts Bar charts

NUMERICAL DATA

Histogram Frequency distributions, cumulative distributions

Polygon

Numerical data

Ogive

Ordered array Stem and leaf

Ordered array – a sequence of data ranked in order, e.g. 1, 2, 3, 4, 5 Stem and leaf – quick way to see distribution details in a data set 

Frequency distribution – summary table in which data is arranged into numerically ordered classes or intervals  Advantages; good to summarise, condenses data into a more useful form, visual representation

Steps for frequency distribution; 1) 2) 3) 4) 5) 6) 7)

Sort data into ascending order Find range (largest – smallest) Select number of class (the number of numerical categories) Compute class interval (width) = range / number of classes Set limits e.g. 0, 10, 20, 30, 40, 50 Set class midpoints e.g. 5, 15, 25, 35, 45 Make a table as exampled;

Class

Frequency

%

Cumulative frequency

Cumulative %

0 < 10 11 < 20 21 < 30 31 < 40 41 < 50

6 2 6 4 0

30 10 30 20 0

6 8 14 18 18

30 40 70 90 90

Class – class boundaries (limits) Frequency – number of numbers in the class Percentage - % of frequency against 100% Cumulative frequency – frequency + previous frequency Cumulative % - % of cumulative frequency

Class intervals and class boundaries  Each data value belongs to only one class  Each class grouping has the same width Range = Largest value – smallest value Width of interval = Range / Number of desired class groupings

Histogram – Graph of the data in a frequency distribution Difference between bar graph; no gap between bars, there’s classes

 Frequency polygon – the midpoint of the bars on a histogram

The Ogive – table/graph for cumulative data 

 Scatter diagrams – used to examine possible relationships between 2 numerical variables

Time-series plot – used to study patterns or changes over time 

CHAPTER 3: NUMERICAL DESCRIPTIVE MEASURES

Describing data Central tendency

Quartiles

Variation

Arithmetic mean

Range

Median

Interquartile range

Mode

Variance

Geometric mean

Standard variance

Shape

Skewness

Coefficient of variation

Arithmetic mean – another word for average/mean, is affected by extreme values or OUTLIERS Median – the actual “middle” number, e.g. 1 2 3 4 5 Median formula: (n + 1)/2 = median position/rank (Note: if sample size is even, the median is the average of the 2 middle values) Mode – the number of the highest tendency/frequency

Quartiles – splits the data into 4 segments with equal number of values per segment Q1 Q2 (median) Q3

(n + 1)/4 = first quartile RANK (n + 1)/2 3(n + 1)/4