Big Data, Small Shops Analytics for the 99% Jennifer MacCormack CASE8
February 5, 2015 Seattle, WA
Key Objectives • • • • • • •
Big Data defined (and demystified) Challenges and roadblocks Starting down that road Data Simple, effective ways to create insights and ignite discussion The Last Mile: Building a data culture Q&A
Big Data Evolution of the term and its current uses
Big Data Defined Sort of…
• UC Berkeley School of Information surveyed 40 thought leaders in various industries on how they defined “big data.” • Answered varied • • • • • •
Volume, velocity, and variety Technologies that can collect and process massive datasets Data mining Merging disparate and often complex datasets to create insights Belief that more is better and answers will flow from “pool of ones and zeros” Attitude by organizations that combining data from many sources can lead to better decisions.
To me, “big data” is the situation where an organization can (arguably) say that they have access to what they need to reconstruct, understand, and model the part of the world that they care about. Using their big data, then, they can (try to) predict future states of the world, optimize their processes, and otherwise be more effective and rational in their activities. Harlan Harris, Director, Data Science at Education Advisory Board. President and Co-Founder, Data Community DC
Key Words • Access – Data, data, data • Reconstruct – What do things look like? • Understand – What trends, patterns, relationships are apparent? • Model – What can we build from our data to better understand?
Key Words • Optimize – Transforming data insights into better decisions • Be more effective – Data-driven decisions help us be more focused and strategic in our efforts • Rational – Testing assumptions, making the case, and using data to inform best practices.
Challenges for the start-up
Common Roadblocks What are yours?
• Time • Technical expertise • Budget/resources • Access to the data • Support from the top • Data inertia or “doing the same thing we’ve always done because it works.”
Starting Down That Road Some analytics is better than none, so start somewhere (and simple)
Define Business Needs Why Does Your Organization Need Analytics?
• Gap analysis • • • • • •
Where are you least efficient in your processes? Where are you strained from lack of human or technical resources? What opportunities do you often miss? How often do you use your data to inform decisions? How are you capturing donor trends? How well do you know your donors based on the data?
Your Start-up Toolkit • Curiosity • Entrepreneurial attitude • Data • Excel • Internet access • Project (gap analysis)
Data
Knowing it, getting it, and preparing it
How are you getting it? • Common barrier to analytics is access to the data • How are you communicating with your colleagues in IT? • Are you sharing your needs and the purpose of your analysis? • Do you know the data that’s important to you?
What’s enough? What data can get you some insight?
• Even just gift data is enough to get you started • Some common ways of looking at giving based on gifts • • • • • •
Total gift count Lifetime credit Giving frequency Largest gift Giving in last 3 or 5 years Rate of increase over number of years
• These convey interest, loyalty, and are common predictors for whether someone will make a major gift
And for spice! The cumin and tumeric of variables
• Engagement • • • • •
Events Volunteer Awards Surveys Click-thru data
• Relationships • Greek affiliations • Parents • Friends, associates
• Engagement and relationship data are some of the most predictive of giving behavior. If you don’t record them, you should.
Simple Approaches to Big Insights Finding patterns and relationships in your data
Describe and summarize with impact • Mean, median, and mode • Percentages • Correlations
Mean, median, and mode
Life is hard in Westeros But it follows a nearly normal distribution Age of Death in Westeros
Age of Death in Westeros: • Mean:35.6 • Median:36 • Mode: 40
Teenage years in Westeros are rough…
What it all means… • Mean is sum all of all values divided by the number of all values. • Very susceptible to extreme outliers, the Bill Gates in your dataset
• Median is the middle value when all data is arranged by order of lowest to highest 1 2 3 4 5 6 7 8 9 10 11 12 • Not vulnerable to skewed data
• Mode is the most frequently occurring value in a dataset • Not very good for continuous data • Ignores other data…the mode may be very far away from the median or there may be two
Best practices • Mean – when you know your data is normally distributed • Median – when your data is skewed; if there is a large spread between your mean and median, use the median .
Helpful Tip: LOOK AT YOUR DATA. Use a histogram to see how it’s distributed
Mode Best used for categorical data Deaths Among Houses of Westeros
Use the mean and median to observe donor trends
And test assumptions Sometimes your biggest donors are not who you think Major Gift Donors to School of Dance 70
Number of Records
60
50 40 30 20 10 0
Alumni
Friends
Faculty/Staff
Corp/Fdn
Percentages Measure of ratio
Percentages provide context You provide the value of something relative to the whole Giving Velocity and Gift Size in FY15 Significant Decline
50%
Moderate decline
45%
Flat 40%
Moderate increase
35%
Significant increase
30% 25% 20% 15% 10% 5% 0% 10K-24.9K
25K+
Percentages provide context You provide the value of something relative to the whole
8 major gifts made to the College of Arts & Sciences in 2014 were from donors ranked in the Top 1% of the model.
Is this a high number of gifts? Is this a good number? There’s no context. It might be okay…then again, it may not.
89% of all major gifts made to the College of Arts & Sciences were from donors ranked in the Top 1% of the model.
Here we have context! We know that a high proportion of major gifts were from donors rated most likely to make them.
Correlation Finding relationships in your data
Correlation • Allows us to describe the association between two variables with a single score • Correlation coefficient ranges from -1 to 1. • 1 is a perfect and positive correlation • -1 is a perfect and negative correlation • 0 is no meaningful association between the variables
• Easy to calculate in Excel with the CORREL function
Relationships and giving A love story
• Looked at number of relationships stated in survey and lifetime credit • Correlation indicates that there is a positive relationship between giving and number of relationships • This helped strengthen the case that we needed to capture more relationships.
Number of Relationships
• Question: Do people who have more than one relationship with whom they attended the UW more generous donors?
# of Existing Relationships and Lifetime Credit
6 5 4 3 2 1 0 0
2000
4000 Lifetime Credit
6000
8000
Correlation Caution and recommendations
• Correlation indicates a relationships exists between variables. This is not a causal relationship! • Know your question first • •
Follow a hunch Test an assumption
• Know what relationship you want to describe
• Don’t just throw spaghetti on the wall to see what sticks. That’s not a strategy and that’s not how analytics is done.
Spurious correlations
Simple scores, big impact RFM, Engagement, and Velocity
RFM Recency, Frequency, Monetary
• • • •
The “love stamp” of giving scores from 0-100 A donor’s philanthropic affinity to your organization A gauge of current and evolving loyalty A great way to summarize common giving markers • How recently has a donor been giving • How frequently has a donor been giving • How much has a donor been giving
RFM Ingredients and recipe by Josh Birkholz Recency = 20%
Frequency = 30%
Monetary = 50%
Last gift year
Frequency * years giving multiplier
Largest gift + Lifetime credit
Gave in last year= 20 Gave last 1-2 years = 15 Gave last 2-3 years = 10
Part I: Frequency % Gift Count/[This year]-[First year of giving] 100% = 10 90%-99%=8 80%-89%=6 70%-79%=4 60%-69%=2