DATA VISUALIZATION WITH GGPLOT2
Case Study I Bag Plot
Data Visualization with ggplot2
ggplot2 2.0 ●
Write your own extensions
●
Extremely flexible
●
Create bag plot ●
John Tukey (box plots)
●
2D box plot
Data Visualization with ggplot2
data set > dim(df) [1] 202 2 > head(df) type Value 1 1 99.43952 2 1 99.76982 3 1 101.55871 4 1 100.07051 5 1 100.12929 6 1 101.71506
Data Visualization with ggplot2
2 box plots
Value
> ggplot(df, aes(x = type, Value)) + geom_boxplot() + facet_wrap(~type, ncol = 2, scales = "free") 1
2
●
●
104
152
102
150
148
100
146 98
●
●
1
2
type
Data Visualization with ggplot2
slope plot > df$ID ggplot(df, aes(x = type, Value, group = ID)) + geom_line(alpha = 0.3)
Value
140
120
100 1
2
type
Data Visualization with ggplot2
Distribution of slope
slope
50
Box plot? 45
40
Data Visualization with ggplot2
2 distinct variables > head(dat) group1 1 99.43952 2 99.76982 3 101.55871 4 100.07051 5 100.12929 6 101.71506
group2 149.2896 150.2569 149.7533 149.6525 149.0484 149.9550
Data Visualization with ggplot2
Sca!er plot > ggplot(dat, aes(x = group1, y = group2)) + geom_point()
●
● ● ● ●
152
group2
150
●
●
● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
148
●
146 ●
98
100
102
group1
104
Data Visualization with ggplot2
2D density plot > library(viridis) > ggplot(dat, aes(x = group1, y = group2)) + stat_density_2d(geom = "tile", aes(fill = ..density..),
contour = FALSE) + scale_fill_viridis() 152.5
density
150.0
group2
0.15 0.10 0.05 147.5
145.0 98
100
102
group1
104
Data Visualization with ggplot2
Bag plot > library(aplpack) > bagplot(dat[1:2])
150
●
hull ●
● ●
●
●
●
● ● ●
●
148
● ● ●
● ● ●
● ●
● ● ●
●
●
●
●
●
●
●●
●
● ●
●
● ● ●
●
●
●
●
● ● ● ●
●
● ● ●
● ●
●
●
● ● ●
●● ●
●● ●
●
●● ● ●●
●
●
●
●
●
bag
●
● ● ●
● ●
●
●
●● ●
●
●
●
●
loop
●●
●●
●
● ●
●
●
●
●
146
group2
152
●
●
98
100
102 group1
104
Data Visualization with ggplot2
aplpack > library(aplpack) > plot_data names(plot_data) [1] "center" "hull.center" "hull.bag" "hull.loop" [5] "pxy.bag" "pxy.outer" "pxy.outlier" "hdepths" [9] "is.one.dim" "prdata" "xy" "xydata"
Data Visualization with ggplot2
ggplot2 > ggplot(dat, aes(x = group1, y = group2)) + geom_point()
●
● ● ● ●
152
group2
150
●
●
● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
148
●
146 ●
98
100
102
group1
104
Data Visualization with ggplot2
ggplot2 > ggplot(dat, aes(x = group1, y = group2)) + stat_bag(alpha = 0.2)
152
group2
150
148
146
98
100
102
group1
104
Data Visualization with ggplot2
Remarks ●
Useful but not popular
●
Poorly understood
●
Learn to use ggplot2 extensions
DATA VISUALIZATION WITH GGPLOT2
Let’s practice!
DATA VISUALIZATION WITH GGPLOT2
Case Study II Weather (Part 1)
Data Visualization with ggplot2
Weather
Source: h!p://www.edwardtu"e.com/
Data Visualization with ggplot2
present > dim(present) [1] 153 5 > head(present, n = 4) month day year temp new_day 1 1 1 2016 41 1 2 1 2 2016 37 2 3 1 3 2016 40 3 4 1 4 2016 33 4 > tail(present, n = 4) month day year temp new_day 148 5 28 2016 79 148 149 5 29 2016 80 149 150 5 30 2016 73 150 151 5 31 2016 76 151
Data Visualization with ggplot2
Time series > ggplot(present, aes(x = new_day, y = temp)) + geom_line()
80
temp
60
40
20
0
50
100
new_day
150
Data Visualization with ggplot2
past > str(past) 'data.frame': 7645 obs. of 11 variables: $ month : num 1 1 1 1 1 1 1 1 1 1 ... $ day : num 1 2 3 4 5 6 7 8 9 10 ... $ year : num 1995 1995 1995 1995 1995 ... $ temp : num 44 41 28 31 21 27 42 35 34 29 ... $ new_day : int 1 2 3 4 5 6 7 8 9 10 ... $ upper : num 51 48 57 55 56 62 52 57 54 47 ... $ lower : num 17 15 16 15 21 14 14 12 21 8.5 ... $ avg : num 35.6 35.4 34.9 35.1 35.9 ... $ se : num 2.19 1.83 2.46 2.53 1.92 ... $ avg_upper: num 40.2 39.2 40 40.5 39.9 ... $ avg_lower: num 31 31.5 29.7 29.8 31.9 ...
Data Visualization with ggplot2
Each year separately > ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.2)
temp
75
50
25
0
100
200
new_day
300
Data Visualization with ggplot2
present + past > ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.4) + geom_line(data = present, aes(group = 1), col = "red")
temp
75
50
25
0
100
200
new_day
300
Data Visualization with ggplot2
present + past > ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.4) + geom_line(data = present, aes(group = 1), col = "red")
temp
75
50
25
0
100
200
new_day
300
Data Visualization with ggplot2
Linerange
temp
75
50
25
0
100
200
new_day
300
Data Visualization with ggplot2
Records ● ●
75 ●
● ●
temp
● ●
●
50
●
●
●
●
25 ●
● ●
0
100
200
new_day
300
Data Visualization with ggplot2
Custom legend ● ●
75 ●
● ●
temp
● ●
●
50
●
●
●
●
New record high ●
25 ●
●
New record low ●
●
0
past record high 95% CI range Current year
100
past record low
200
new_day
300
DATA VISUALIZATION WITH GGPLOT2
Let’s practice!
DATA VISUALIZATION WITH GGPLOT2
Case Study II Weather (Part 2)
Data Visualization with ggplot2
Up to now ● ●
75 ●
● ●
temp
● ●
●
50
●
●
●
●
New record high ●
25 ●
●
New record low ●
●
0
past record high 95% CI range Current year
100
past record low
200
new_day
300
Data Visualization with ggplot2
Situation ●
Many data frames
●
Plot summary data frame as a layer
●
stat_summary()
Data Visualization with ggplot2
stat_historical() > ggplot(my_data, aes(x = new_day, y = temp, fill = year)) + stat_historical()
temp
75
50
25
0
100
200
new_day
300
Data Visualization with ggplot2
stat_present() > ggplot(my_data, aes(x = new_day, y = temp, fill = year)) + stat_historical() + stat_present()
temp
75
50
25
0
100
200
new_day
300
Data Visualization with ggplot2
stat_extremes() > ggplot(my_data, aes(new_day, temp, fill = year)) + stat_historical() + stat_present() + stat_extremes(aes(colour = ..record..))
● ●
75 ●
● ●
temp
● ●
●
50
●
●
●
●
25 ●
● ●
0
100
200
new_day
300
Data Visualization with ggplot2
Specific layers > ggplot(my_data, aes(new_day, temp, fill = year)) + stat_historical() + # stat_present() + stat_extremes(aes(colour = ..record..))
● ●
75 ●
● ●
temp
● ●
●
50
●
●
●
●
25 ●
● ●
0
100
200
new_day
300
Data Visualization with ggplot2
Face!ing PARIS
REYKJAVIK
75
●
50 ● ● ●
●
●
temp
25
NEW YORK
LONDON
● ●
75
●
● ●
● ●
●
50
●
●
● ● ● ●
●
●
●
25 ●
● ●
0
100
200
300
0
new_day
100
200
300
DATA VISUALIZATION WITH GGPLOT2
Let’s practice!
DATA VISUALIZATION WITH GGPLOT2
Wrap-up
Data Visualization with ggplot2
Statistics
Graphical Data Analysis
Design
Communication & Perception
Data Visualization with ggplot2
Explore
Explain
Confirm and Analyse
Inform and Persuade
Data Visualization with ggplot2
Element Data
Description The dataset being plo!ed.
Aesthetics
The scales onto which we map our data.
Geometries
The visual elements used for our data.
Data Visualization with ggplot2
Element Data
Description The dataset being plo!ed.
Aesthetics
The scales onto which we map our data.
Geometries
The visual elements used for our data.
Facets Statistics Coordinates Themes
Plo!ing small multiples. Representations of our data to aid understanding. The space on which the data will be plo!ed. All non-data ink.
Data Visualization with ggplot2 24 21
Total sleep time (h)
18 15
●
12 ●
● ●
9 6 3 0 Carnivore
Herbivore Insectivore Omnivore
Eating habits
70
60
Yield (bushels/acre)
50 Site Waseca 40
Crookston Morris University Farm
30
Duluth Grand Rapids
20
10
0 1931
1932
Year
1.00
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
Data Visualization with ggplot2
Obese
1
0.75
Over−weight
0.50
Healthy−weight 0.25
Under−weight
0.00 0
10000
20000
xtext residual −5.0−2.5 0.0 2.5 5.0
30000
40000
Data Visualization with ggplot2 5
density 0.025 0.020 0.015 0.010
3
0.005
2
Unemployment (%) 12 60
70
80
90
9
waiting
6
3
Silt 100 20
80
40
60
60
40
80
20
10 0
80
60
0 10
Sand
40
50
20
eruptions
4
Clay
Data Visualization with ggplot2
Iris Sepals
4.5
Width
4.0
Species
3.5
setosa versicolor
3.0
virginica Anderson, 1936
2.5
2.0
4
5
6
Length
7
8
Data Visualization with ggplot2
152
group2
150
148
146
100
102
104
group1
● ●
75 ●
● ●
●
temp
98
●
●
50
●
●
●
●
New record high ●
25 ●
●
New record low ●
●
0
past record high 95% CI range Current year
100
past record low
200
new_day
300
DATA VISUALIZATION WITH GGPLOT2
Thank you!