Lecture 1: Introduction; data types; sampling data Business Analytics: Analytics is the discovery and communication of meaningful patterns in data Framework for conducting statistical analyses: DCOVA Define the problem or objective and the data required Collect the required date (in an appropriate manner) Organise the data: "clean it", prepare it for analysis, tabulate and summarise it Visualise the data Analyse the data Statistics: Descriptive - Collective, summarising, presenting and organising data (often numbers and percentages) Collection: survey Present: tables and graphs Predictive - Using a model and data to make forecasts of outcomes Inferential - Using data collected from a small group to draw conclusions about a larger group (drawing a meaningful conclusion based on a sample) Estimation: Estimate the population average amount spent using the sample average spent Hypothesis testing: test the claim that the population average amount spent in one group is larger than that in another group Basic Vocabulary Variable - variables are characteristics of an item or individual. Data on variable(s) is what you analyse when you use a statistical method Data - data are the observed values or outcomes of one or more variables Operational Definition - Variables should have universally accepted meanings that are clear to all associated with an analysis; the clearly defined meaning is the operational definition Population - a population consists of all the items or individuals about which you want to draw a conclusion. The population is the "large group" Sample - a sample is the portion of a population selected for analysis. The sample is the "Small group" Parameter - a parameter is a numerical measure that describes a relevant characteristic of a population Statistic - a statistic is a numerical measure that describes a characteristic of a sample. Types of Variables Categorical (qualitative) variables have values that can only be placed into categories Numerical (quantitative) variables have values that represent actual number quantities, e.g.: Discrete variables arise from a counting process Continuous variables arise from a measuring process Sometimes discrete variables with many outcomes are treated like continuous variables, e.g. prices
Levels of Data Measurement (Measurement scales) Nominal - lowest level of measurement • Labels are used to distinguish different categories that have no order Ordinal • Cannot be analysed • Are used to classify and indicate rank or order • Often represent an underlying scale: e.g. quality • Differences between levels are not comparable (e.g. star ratings between restaurants/movies) Interval • Numerical nature and ordering • The gaps between them have consistent meaning • There is no true 0 (e.g. scaled marks) Ratio - highest level of measurement • Same as interval • Zero has a true value • Represents absence of the thing being measured Data Collection Primary Sources - the analyst collects the data Secondary Sources - the analyst is not the data collector •
•
•
•
Data distributed by an organisation or an individual o Financial data on a company provided by investment services o Industry or market data from market research firms and trade associations o Stock prices, weather conditions, sports statistics o Google A designed experiment o Consumer testing of different versions of a product o Quality testing (e.g. material) o Market testing: which product promotion to use o Testing web page designs A survey o Political polls o Determine customer satisfaction with a recent product or service experience o Internet polls An observational study