Big_Data_Small_Shops.pdf Big Data, Small Shops

Report 4 Downloads 62 Views
Big Data, Small Shops Analytics for the 99% Jennifer MacCormack CASE8

February 5, 2015 Seattle, WA

Key Objectives • • • • • • •

Big Data defined (and demystified) Challenges and roadblocks Starting down that road Data Simple, effective ways to create insights and ignite discussion The Last Mile: Building a data culture Q&A

Big Data Evolution of the term and its current uses

Big Data Defined Sort of…

• UC Berkeley School of Information surveyed 40 thought leaders in various industries on how they defined “big data.” • Answered varied • • • • • •

Volume, velocity, and variety Technologies that can collect and process massive datasets Data mining Merging disparate and often complex datasets to create insights Belief that more is better and answers will flow from “pool of ones and zeros” Attitude by organizations that combining data from many sources can lead to better decisions.

To me, “big data” is the situation where an organization can (arguably) say that they have access to what they need to reconstruct, understand, and model the part of the world that they care about. Using their big data, then, they can (try to) predict future states of the world, optimize their processes, and otherwise be more effective and rational in their activities. Harlan Harris, Director, Data Science at Education Advisory Board. President and Co-Founder, Data Community DC

Key Words • Access – Data, data, data • Reconstruct – What do things look like? • Understand – What trends, patterns, relationships are apparent? • Model – What can we build from our data to better understand?

Key Words • Optimize – Transforming data insights into better decisions • Be more effective – Data-driven decisions help us be more focused and strategic in our efforts • Rational – Testing assumptions, making the case, and using data to inform best practices.

Challenges for the start-up

Common Roadblocks What are yours?

• Time • Technical expertise • Budget/resources • Access to the data • Support from the top • Data inertia or “doing the same thing we’ve always done because it works.”

Starting Down That Road Some analytics is better than none, so start somewhere (and simple)

Define Business Needs Why Does Your Organization Need Analytics?

• Gap analysis • • • • • •

Where are you least efficient in your processes? Where are you strained from lack of human or technical resources? What opportunities do you often miss? How often do you use your data to inform decisions? How are you capturing donor trends? How well do you know your donors based on the data?

Your Start-up Toolkit • Curiosity • Entrepreneurial attitude • Data • Excel • Internet access • Project (gap analysis)

Data

Knowing it, getting it, and preparing it

How are you getting it? • Common barrier to analytics is access to the data • How are you communicating with your colleagues in IT? • Are you sharing your needs and the purpose of your analysis? • Do you know the data that’s important to you?

What’s enough? What data can get you some insight?

• Even just gift data is enough to get you started • Some common ways of looking at giving based on gifts • • • • • •

Total gift count Lifetime credit Giving frequency Largest gift Giving in last 3 or 5 years Rate of increase over number of years

• These convey interest, loyalty, and are common predictors for whether someone will make a major gift

And for spice! The cumin and tumeric of variables

• Engagement • • • • •

Events Volunteer Awards Surveys Click-thru data

• Relationships • Greek affiliations • Parents • Friends, associates

• Engagement and relationship data are some of the most predictive of giving behavior. If you don’t record them, you should.

Simple Approaches to Big Insights Finding patterns and relationships in your data

Describe and summarize with impact • Mean, median, and mode • Percentages • Correlations

Mean, median, and mode

Life is hard in Westeros But it follows a nearly normal distribution Age of Death in Westeros

Age of Death in Westeros: • Mean:35.6 • Median:36 • Mode: 40

Teenage years in Westeros are rough…

What it all means… • Mean is sum all of all values divided by the number of all values. • Very susceptible to extreme outliers, the Bill Gates in your dataset

• Median is the middle value when all data is arranged by order of lowest to highest 1 2 3 4 5 6 7 8 9 10 11 12 • Not vulnerable to skewed data

• Mode is the most frequently occurring value in a dataset • Not very good for continuous data • Ignores other data…the mode may be very far away from the median or there may be two

Best practices • Mean – when you know your data is normally distributed • Median – when your data is skewed; if there is a large spread between your mean and median, use the median .

Helpful Tip: LOOK AT YOUR DATA. Use a histogram to see how it’s distributed

Mode Best used for categorical data Deaths Among Houses of Westeros

Use the mean and median to observe donor trends

And test assumptions Sometimes your biggest donors are not who you think Major Gift Donors to School of Dance 70

Number of Records

60

50 40 30 20 10 0

Alumni

Friends

Faculty/Staff

Corp/Fdn

Percentages Measure of ratio

Percentages provide context You provide the value of something relative to the whole Giving Velocity and Gift Size in FY15 Significant Decline

50%

Moderate decline

45%

Flat 40%

Moderate increase

35%

Significant increase

30% 25% 20% 15% 10% 5% 0% 10K-24.9K

25K+

Percentages provide context You provide the value of something relative to the whole

8 major gifts made to the College of Arts & Sciences in 2014 were from donors ranked in the Top 1% of the model.

Is this a high number of gifts? Is this a good number? There’s no context. It might be okay…then again, it may not.

89% of all major gifts made to the College of Arts & Sciences were from donors ranked in the Top 1% of the model.

Here we have context! We know that a high proportion of major gifts were from donors rated most likely to make them.

Correlation Finding relationships in your data

Correlation • Allows us to describe the association between two variables with a single score • Correlation coefficient ranges from -1 to 1. • 1 is a perfect and positive correlation • -1 is a perfect and negative correlation • 0 is no meaningful association between the variables

• Easy to calculate in Excel with the CORREL function

Relationships and giving A love story

• Looked at number of relationships stated in survey and lifetime credit • Correlation indicates that there is a positive relationship between giving and number of relationships • This helped strengthen the case that we needed to capture more relationships.

Number of Relationships

• Question: Do people who have more than one relationship with whom they attended the UW more generous donors?

# of Existing Relationships and Lifetime Credit

6 5 4 3 2 1 0 0

2000

4000 Lifetime Credit

6000

8000

Correlation Caution and recommendations

• Correlation indicates a relationships exists between variables. This is not a causal relationship! • Know your question first • •

Follow a hunch Test an assumption

• Know what relationship you want to describe

• Don’t just throw spaghetti on the wall to see what sticks. That’s not a strategy and that’s not how analytics is done.

Spurious correlations

Simple scores, big impact RFM, Engagement, and Velocity

RFM Recency, Frequency, Monetary

• • • •

The “love stamp” of giving scores from 0-100 A donor’s philanthropic affinity to your organization A gauge of current and evolving loyalty A great way to summarize common giving markers • How recently has a donor been giving • How frequently has a donor been giving • How much has a donor been giving

RFM Ingredients and recipe by Josh Birkholz Recency = 20%

Frequency = 30%

Monetary = 50%

Last gift year

Frequency * years giving multiplier

Largest gift + Lifetime credit

Gave in last year= 20 Gave last 1-2 years = 15 Gave last 2-3 years = 10

Part I: Frequency % Gift Count/[This year]-[First year of giving] 100% = 10 90%-99%=8 80%-89%=6 70%-79%=4 60%-69%=2