DATA VISUALIZATION WITH GGPLOT2

Report 5 Downloads 328 Views
DATA VISUALIZATION WITH GGPLOT2

Case Study I Bag Plot

Data Visualization with ggplot2

ggplot2 2.0 ●

Write your own extensions



Extremely flexible



Create bag plot ●

John Tukey (box plots)



2D box plot

Data Visualization with ggplot2

data set > dim(df) [1] 202 2 > head(df) type Value 1 1 99.43952 2 1 99.76982 3 1 101.55871 4 1 100.07051 5 1 100.12929 6 1 101.71506

Data Visualization with ggplot2

2 box plots

Value

> ggplot(df, aes(x = type, Value)) + geom_boxplot() + facet_wrap(~type, ncol = 2, scales = "free") 1

2





104

152

102

150

148

100

146 98





1

2

type

Data Visualization with ggplot2

slope plot > df$ID ggplot(df, aes(x = type, Value, group = ID)) + geom_line(alpha = 0.3)

Value

140

120

100 1

2

type

Data Visualization with ggplot2

Distribution of slope

slope

50

Box plot? 45

40

Data Visualization with ggplot2

2 distinct variables > head(dat) group1 1 99.43952 2 99.76982 3 101.55871 4 100.07051 5 100.12929 6 101.71506

group2 149.2896 150.2569 149.7533 149.6525 149.0484 149.9550

Data Visualization with ggplot2

Sca!er plot > ggplot(dat, aes(x = group1, y = group2)) + geom_point()



● ● ● ●

152

group2

150





● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●

148



146 ●

98

100

102

group1

104

Data Visualization with ggplot2

2D density plot > library(viridis) > ggplot(dat, aes(x = group1, y = group2)) + stat_density_2d(geom = "tile", aes(fill = ..density..), 
 contour = FALSE) + scale_fill_viridis() 152.5

density

150.0

group2

0.15 0.10 0.05 147.5

145.0 98

100

102

group1

104

Data Visualization with ggplot2

Bag plot > library(aplpack) > bagplot(dat[1:2])

150



hull ●

● ●







● ● ●



148

● ● ●

● ● ●

● ●

● ● ●













●●



● ●



● ● ●









● ● ● ●



● ● ●

● ●





● ● ●

●● ●

●● ●



●● ● ●●











bag



● ● ●

● ●





●● ●









loop

●●

●●



● ●









146

group2

152





98

100

102 group1

104

Data Visualization with ggplot2

aplpack > library(aplpack) > plot_data names(plot_data) [1] "center" "hull.center" "hull.bag" "hull.loop" [5] "pxy.bag" "pxy.outer" "pxy.outlier" "hdepths" [9] "is.one.dim" "prdata" "xy" "xydata" 


Data Visualization with ggplot2

ggplot2 > ggplot(dat, aes(x = group1, y = group2)) + geom_point()



● ● ● ●

152

group2

150





● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●

148



146 ●

98

100

102

group1

104

Data Visualization with ggplot2

ggplot2 > ggplot(dat, aes(x = group1, y = group2)) + stat_bag(alpha = 0.2)

152

group2

150

148

146

98

100

102

group1

104

Data Visualization with ggplot2

Remarks ●

Useful but not popular



Poorly understood



Learn to use ggplot2 extensions

DATA VISUALIZATION WITH GGPLOT2

Let’s practice!

DATA VISUALIZATION WITH GGPLOT2

Case Study II Weather (Part 1)

Data Visualization with ggplot2

Weather

Source: h!p://www.edwardtu"e.com/

Data Visualization with ggplot2

present > dim(present) [1] 153 5 > head(present, n = 4) month day year temp new_day 1 1 1 2016 41 1 2 1 2 2016 37 2 3 1 3 2016 40 3 4 1 4 2016 33 4 > tail(present, n = 4) month day year temp new_day 148 5 28 2016 79 148 149 5 29 2016 80 149 150 5 30 2016 73 150 151 5 31 2016 76 151

Data Visualization with ggplot2

Time series > ggplot(present, aes(x = new_day, y = temp)) + geom_line()

80

temp

60

40

20

0

50

100

new_day

150

Data Visualization with ggplot2

past > str(past) 'data.frame': 7645 obs. of 11 variables: $ month : num 1 1 1 1 1 1 1 1 1 1 ... $ day : num 1 2 3 4 5 6 7 8 9 10 ... $ year : num 1995 1995 1995 1995 1995 ... $ temp : num 44 41 28 31 21 27 42 35 34 29 ... $ new_day : int 1 2 3 4 5 6 7 8 9 10 ... $ upper : num 51 48 57 55 56 62 52 57 54 47 ... $ lower : num 17 15 16 15 21 14 14 12 21 8.5 ... $ avg : num 35.6 35.4 34.9 35.1 35.9 ... $ se : num 2.19 1.83 2.46 2.53 1.92 ... $ avg_upper: num 40.2 39.2 40 40.5 39.9 ... $ avg_lower: num 31 31.5 29.7 29.8 31.9 ...

Data Visualization with ggplot2

Each year separately > ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.2)

temp

75

50

25

0

100

200

new_day

300

Data Visualization with ggplot2

present + past > ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.4) + geom_line(data = present, aes(group = 1), col = "red")

temp

75

50

25

0

100

200

new_day

300

Data Visualization with ggplot2

present + past > ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.4) + geom_line(data = present, aes(group = 1), col = "red")

temp

75

50

25

0

100

200

new_day

300

Data Visualization with ggplot2

Linerange

temp

75

50

25

0

100

200

new_day

300

Data Visualization with ggplot2

Records ● ●

75 ●

● ●

temp

● ●



50









25 ●

● ●

0

100

200

new_day

300

Data Visualization with ggplot2

Custom legend ● ●

75 ●

● ●

temp

● ●



50









New record high ●

25 ●



New record low ●



0

past record high 95% CI range Current year

100

past record low

200

new_day

300

DATA VISUALIZATION WITH GGPLOT2

Let’s practice!

DATA VISUALIZATION WITH GGPLOT2

Case Study II Weather (Part 2)

Data Visualization with ggplot2

Up to now ● ●

75 ●

● ●

temp

● ●



50









New record high ●

25 ●



New record low ●



0

past record high 95% CI range Current year

100

past record low

200

new_day

300

Data Visualization with ggplot2

Situation ●

Many data frames



Plot summary data frame as a layer



stat_summary()

Data Visualization with ggplot2

stat_historical() > ggplot(my_data, aes(x = new_day, y = temp, fill = year)) + stat_historical()

temp

75

50

25

0

100

200

new_day

300

Data Visualization with ggplot2

stat_present() > ggplot(my_data, aes(x = new_day, y = temp, fill = year)) + stat_historical() + stat_present()

temp

75

50

25

0

100

200

new_day

300

Data Visualization with ggplot2

stat_extremes() > ggplot(my_data, aes(new_day, temp, fill = year)) + stat_historical() + stat_present() + stat_extremes(aes(colour = ..record..))

● ●

75 ●

● ●

temp

● ●



50









25 ●

● ●

0

100

200

new_day

300

Data Visualization with ggplot2

Specific layers > ggplot(my_data, aes(new_day, temp, fill = year)) + stat_historical() + # stat_present() + stat_extremes(aes(colour = ..record..))

● ●

75 ●

● ●

temp

● ●



50









25 ●

● ●

0

100

200

new_day

300

Data Visualization with ggplot2

Face!ing PARIS

REYKJAVIK

75



50 ● ● ●





temp

25

NEW YORK

LONDON

● ●

75



● ●

● ●



50





● ● ● ●







25 ●

● ●

0

100

200

300

0

new_day

100

200

300

DATA VISUALIZATION WITH GGPLOT2

Let’s practice!

DATA VISUALIZATION WITH GGPLOT2

Wrap-up

Data Visualization with ggplot2

Statistics

Graphical Data Analysis

Design

Communication & Perception

Data Visualization with ggplot2

Explore

Explain

Confirm and Analyse

Inform and Persuade

Data Visualization with ggplot2

Element Data

Description The dataset being plo!ed.

Aesthetics

The scales onto which we map our data.

Geometries

The visual elements used for our data.

Data Visualization with ggplot2

Element Data

Description The dataset being plo!ed.

Aesthetics

The scales onto which we map our data.

Geometries

The visual elements used for our data.

Facets Statistics Coordinates Themes

Plo!ing small multiples. Representations of our data to aid understanding. The space on which the data will be plo!ed. All non-data ink.

Data Visualization with ggplot2 24 21

Total sleep time (h)

18 15



12 ●

● ●

9 6 3 0 Carnivore

Herbivore Insectivore Omnivore

Eating habits

70

60

Yield (bushels/acre)

50 Site Waseca 40

Crookston Morris University Farm

30

Duluth Grand Rapids

20

10

0 1931

1932

Year

1.00

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84

Data Visualization with ggplot2

Obese

1

0.75

Over−weight

0.50

Healthy−weight 0.25

Under−weight

0.00 0

10000

20000

xtext residual −5.0−2.5 0.0 2.5 5.0

30000

40000

Data Visualization with ggplot2 5

density 0.025 0.020 0.015 0.010

3

0.005

2

Unemployment (%) 12 60

70

80

90

9

waiting

6

3

Silt 100 20

80

40

60

60

40

80

20

10 0

80

60

0 10

Sand

40

50

20

eruptions

4

Clay

Data Visualization with ggplot2

Iris Sepals

4.5

Width

4.0

Species

3.5

setosa versicolor

3.0

virginica Anderson, 1936

2.5

2.0

4

5

6

Length

7

8

Data Visualization with ggplot2

152

group2

150

148

146

100

102

104

group1

● ●

75 ●

● ●



temp

98





50









New record high ●

25 ●



New record low ●



0

past record high 95% CI range Current year

100

past record low

200

new_day

300

DATA VISUALIZATION WITH GGPLOT2

Thank you!