DataCamp
Human Resources Analytics in R: Exploring Employee Data
HUMAN RESOURCES ANALYTICS IN R: EXPLORING EMPLOYEE DATA
Analyzing employee engagement Ben Teusch HR Analytics Consultant
DataCamp
Human Resources Analytics in R: Exploring Employee Data
What is employee engagement? engaged employees: those who are involved in, enthusiastic about and committed to their work and workplace. (Gallup)
[1] http://news.gallup.com/poll/180404/gallup-daily-employee-engagement.aspx
DataCamp
Human Resources Analytics in R: Exploring Employee Data
What is employee engagement?
DataCamp
Human Resources Analytics in R: Exploring Employee Data
The survey data > head(survey) # A tibble: 6 x 5 employee_id department engagement salary vacation_days_taken 1 1 Sales 3 103263.64 7 2 2 Engineering 2 80708.64 12 3 4 Engineering 4 60737.05 12 4 5 Engineering 3 99116.32 7 5 7 Engineering 3 51021.64 18 6 8 Engineering 5 98399.87 9
DataCamp
Human Resources Analytics in R: Exploring Employee Data
Review of mutate() > survey %>% + mutate(max_salary = max(salary)) # A tibble: 1,470 x 6 employee_id department engagement salary vacation_days_taken max_salary 1 1 Sales 3 103263.64 7 164072.6 2 2 Engineering 2 80708.64 12 164072.6 3 4 Engineering 4 60737.05 12 164072.6 4 5 Engineering 3 99116.32 7 164072.6 5 7 Engineering 3 51021.64 18 164072.6 # ... with 1,465 more rows
DataCamp
Human Resources Analytics in R: Exploring Employee Data
The ifelse() function > x if(x < 10){ "True" } else { "False" } [1] "True" > z if(z < 10){ "True" } else { "False" } [1] "True" Warning message: In if (z < 10) { : the condition has length > 1 and only the first element will be used > ifelse(z < 10, "Yes", "No") [1] "Yes" "Yes" "No" "No"
DataCamp
Human Resources Analytics in R: Exploring Employee Data
ifelse() + mutate() > survey %>% + mutate(takes_vacation = ifelse(vacation_days_taken > 10, "Yes", "No")) # A tibble: 1,470 x 6 employee_id engagement salary vacation_days_taken takes_vacation 1 1 3 103263.64 7 No 2 2 2 80708.64 12 Yes 3 4 4 60737.05 12 Yes 4 5 3 99116.32 7 No 5 7 3 51021.64 18 Yes # ... with 1,465 more rows
DataCamp
Multiple summarizes > survey %>% + group_by(department) %>% + summarize(max_salary = max(salary)) # A tibble: 3 x 2 department max_salary 1 Engineering 164072.6 2 Finance 127013.2 3 Sales 143105.5
Human Resources Analytics in R: Exploring Employee Data
DataCamp
Multiple summarizes > survey %>% + group_by(department) %>% + summarize(max_salary = max(salary), + min_salary = min(salary), + avg_salary = mean(salary)) # A tibble: 3 x 4 department max_salary min_salary avg_salary 1 Engineering 164072.6 45529.69 73576.35 2 Finance 127013.2 45714.07 76651.66 3 Sales 143105.5 46133.67 75073.57
Human Resources Analytics in R: Exploring Employee Data
DataCamp
Human Resources Analytics in R: Exploring Employee Data
HUMAN RESOURCES ANALYTICS IN R: EXPLORING EMPLOYEE DATA
Let's practice!
DataCamp
Human Resources Analytics in R: Exploring Employee Data
HUMAN RESOURCES ANALYTICS IN R: EXPLORING EMPLOYEE DATA
Visualizing engagement data Ben Teusch HR Analytics Consultant
DataCamp
Human Resources Analytics in R: Exploring Employee Data
Visualizing several variables at once
DataCamp
The tidyr package
Human Resources Analytics in R: Exploring Employee Data
DataCamp
Using tidyr::gather() library(tidyr) data %>% gather(columns, key = "key", value = "value")
Human Resources Analytics in R: Exploring Employee Data
DataCamp
Human Resources Analytics in R: Exploring Employee Data
Using tidyr::gather() > survey_summary # A tibble: 3 x 3 department average_engagement average_promotions 1 Engineering 3.150884 0.4776275 2 Finance 3.238095 0.5396825 3 Sales 2.807175 0.4910314 survey_summary %>% gather(average_engagement, average_promotions, key = "key", value = "value") # A tibble: 6 x 3 department key value 1 Engineering average_engagement 3.1508845 2 Finance average_engagement 3.2380952 3 Sales average_engagement 2.8071749 4 Engineering average_promotions 0.4776275 5 Finance average_promotions 0.5396825 6 Sales average_promotions 0.4910314
DataCamp
Human Resources Analytics in R: Exploring Employee Data
Adding color to bar charts survey_gathered % gather(average_engagement, average_promotions, key = "key", value = "value") > ggplot(survey_gathered, aes(key, value, fill = department)) + + geom_col()
DataCamp
Adding color to bar charts
Human Resources Analytics in R: Exploring Employee Data
DataCamp
Human Resources Analytics in R: Exploring Employee Data
Side-by-side bar charts > ggplot(survey_gathered, aes(key, value, fill = department)) + + geom_col(position = "dodge")
DataCamp
Side-by-side bar charts
Human Resources Analytics in R: Exploring Employee Data
DataCamp
Human Resources Analytics in R: Exploring Employee Data
Adding facets > ggplot(survey_gathered, aes(x = key, y = value, fill = department)) + + geom_col(position = "dodge") + + facet_wrap(~ key, scales = "free")
DataCamp
Human Resources Analytics in R: Exploring Employee Data
DataCamp
Human Resources Analytics in R: Exploring Employee Data
HUMAN RESOURCES ANALYTICS IN R: EXPLORING EMPLOYEE DATA
Let's practice!
DataCamp
Human Resources Analytics in R: Exploring Employee Data
HUMAN RESOURCES ANALYTICS IN R: EXPLORING EMPLOYEE DATA
Testing differences between groups Ben Teusch HR Analytics Consultant
DataCamp
Comparing two groups
Human Resources Analytics in R: Exploring Employee Data
DataCamp
Quantifying the likelihood
Human Resources Analytics in R: Exploring Employee Data
DataCamp
Human Resources Analytics in R: Exploring Employee Data
The t-test Use when the variable to compare is continuous > t.test(tenure ~ is_manager, data = survey) Welch Two Sample t-test data: tenure by is_manager t = -1.2158, df = 834.19, p-value = 0.2244 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.34197615 0.08037243 sample estimates: mean in group Non-manager mean in group Manager 7.376555 7.507357
DataCamp
Human Resources Analytics in R: Exploring Employee Data
The chi-squared test Use when the variable to compare is categorical > chisq.test(survey$left_company, survey$is_manager) Pearson's Chi-squared test with Yates' continuity correction data: survey$left_company and survey$is_manager X-squared = 26.275, df = 1, p-value = 1.97e-06
DataCamp
Where are the formulas?
Human Resources Analytics in R: Exploring Employee Data
DataCamp
Human Resources Analytics in R: Exploring Employee Data
HUMAN RESOURCES ANALYTICS IN R: EXPLORING EMPLOYEE DATA
Let's practice!