Business Statistics Types of Statistics
Descriptive: collecting, summarising and describing data. Process includes; collect data, present data, characterize data (e.g. mean) Inferential: drawing conclusions and or making decision concerning a population based on a sample data. Process includes; estimation (estimate the population mean using the sample mean), hypothesis testing.
Population: collection of all possible individuals, objects or measurements of interest. Sample: a portion of the population of interest.
Four Levels of measurement of data
Nominal: data that is classified into non-overlapping categories and cannot be arrange in any particular order. E.g. eye color, gender, brand of tv Ordinal level: data that is classified into non-overlapping categories in which ranking is implied. E.g. test performance grand (A,B,C,D,E,F) Interval level: is an ordered scale in which the difference between measurements is a meaningful quantity but the measurements do not have a true zero point. E.g. shoe size (zero has a value) Ratio level: the interval level with an inherent zero starting point. (Zero is significant) e.g. price, distance travelled, time taken.
Graphical presentation of quantitative/ numerical data
Scatter plot: a plot or graph of pairwise data from two continuous variables, to explore the relationship between them. Pie Chart: a circular display of data where the area of the whole pie represents 100% of the data Bar Chart: a graph in which a bar shows each category. (categorical data) Histogram: a type of vertical bar where the area of each bar is equal to the frequency of the corresponding interval. (quantitative data)
Data collection
Cross section: data is collected at one point in time. E.g. stock price on the same day Time-series: data is collected over. E.g. stock price on a numerous amount of days.
Measures of Central Tendency Mean
∑ indicates the operation of adding. ∑x is the sum of the x values in the population Affected by extreme values Not used for ordinal and nominal data
Median
Is the middle value of a set of numbers after they have been arranged in order. Applicable for ordinal, interval and ratio data, unaffected by extremely larger and extremely small values
Mode
Is the most frequently occurring value in a data seat It is applicable to all levels of data measurement (nominal, ordinal, interval and ratio)
Measure of Dispersion/Spread
Describes the spread/variability of a set of data and include; range, variance, standard deviation, coefficient of variation (CV), Z scores, IQR.
Range
The difference between the largest and the smallest values in a set of data.
Variance
Is the average of the squared deviations from the arithmetic mean.
Standard Deviation
Is the square root of the variance For populations whose values are dispersed from the mean, the population variance and standard deviation will be large. (X-Mean)=Standard deviation
1. Calculate the mean 2. Calculate the distance of each observation from the mean and square that difference 3. Sum all the squared differences calculated in step 2 4. Divide the sum of the squared differences. 5. Take the square root of variance calculate in previous step to get standard deviation Variance and standard deviation indicated the variability in the class performance as measured by test scores Sample variance and standard deviation If the data is a sample then we use
Coefficient of variation
The ratio of the standard deviation to the mean expressed as a percentage. Measurement of relative dispersion and used to compared standard deviation/variability of datasets with different means.
Z-Score
A Z score represents the number of SD a value is above or below the mean of a set of numbers. A negative z-score indicated that the item is below average A positive z-score means that the item is above average