IMPORTING DATA INTO R
Importing Excel Data readxl
Importing Data into R
Microso! Excel ●
Common data analysis tool
●
Many R packages to interact with Excel
●
readxl - Hadley Wickham
Importing Data into R
Typical Structure Excel Data Different sheets with tabular data Capital
Population
New York
16044000
Berlin
3433695 Capital 3010492 New York 1683713 Berlin
Madrid Stockholm year_1990
17800000 3382169
Madrid
2938723
Stockholm
1942362
year_2000 Source: Wikipedia
Population
Importing Data into R
readxl ●
excel_sheets()
list different sheets
●
read_excel()
actually import data into R
> install.packages("readxl") > library(readxl)
Importing Data into R
excel_sheets() > dir() [1] "cities.xlsx"
"the_rest_is_secret.txt"
Capital New York Berlin Madrid Stockholm
year_1990
Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm
year_2000
> excel_sheets("cities.xlsx") [1] "year_1990" "year_2000"
Population 17800000 3382169 2938723 1942362
Importing Data into R
read_excel() > read_excel("cities.xlsx")
1 2 3 4
Capital Population New York 16044000 Berlin 3433695 Madrid 3010492 Stockholm 1683713
> read_excel("cities.xlsx", sheet = 2) > read_excel("cities.xlsx", sheet = "year_2000") 1 2 3 4
Capital Population New York 17800000 Berlin 3382169 Madrid 2938723 Stockholm 1942362
Capital New York Berlin Madrid Stockholm
year_1990
Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm
year_2000
Population 17800000 3382169 2938723 1942362
Importing Data into R
read_excel() read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, skip = 0)
!
Importing Data into R
read_excel() - col_names read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, skip = 0)
col_names = FALSE: R assigns names itself col_names = character vector: manually specify
!
Importing Data into R
read_excel() - col_types !
read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, skip = 0)
Capital New York Berlin Madrid Stockholm
year_1990
Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm
year_2000
> pop_data str(pop_data) Classes 'tbl_df', 'tbl' and 'data.frame':
4 obs. of 2 variables: $ Capital : chr "New York" "Berlin" "Madrid" ... $ Population: chr "16044000" "3433695" "3010492" ...
numeric date blank
Population 17800000 3382169 2938723 1942362
Importing Data into R
read_excel() - col_types read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, skip = 0)
> read_excel("cities.xlsx", col_types = c("text", "blank")) 1 2 3 4
Capital New York Berlin Madrid Stockholm
!
Capital New York Berlin Madrid Stockholm
year_1990
Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm
year_2000
Population 17800000 3382169 2938723 1942362
Importing Data into R
read_excel() - skip !
read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, skip = 0)
year_1990
Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm
year_2000
> read_excel("cities.xlsx", col_names = c("Capital", "Population"), skip = 2) Capital Population 1 Berlin 3433695 2 Madrid 3010492 3 Stockholm 1683713
Capital New York Berlin Madrid Stockholm
n_max not (yet) available
Population 17800000 3382169 2938723 1942362
Importing Data into R
Wrap-up ●
excel_sheets()
●
read_excel()
●
Everything you need!
●
Fast
●
Same arguments as in readr package
●
Consistency
IMPORTING DATA INTO R
Let’s practice!
IMPORTING DATA INTO R
Importing Excel Data gdata
Importing Data into R
gdata ●
Gregory Warnes
●
Entire suite of tools for data manipulation
●
Supercharges basic R
●
read.xls()
●
Support for XLS
●
Support for XLSX with additional driver
●
No readxl::excel_sheets() equivalent
Importing Data into R
gdata read.table() > 15 arguments! XLS
Perl
CSV
read.csv()
R data frame
●
Elegant extension of utils package
●
Easy if familiar with utils
●
Extremely inefficient
●
readxl < v1.x
Importing Data into R
cities.xls Capital
Population
New York
16044000
Berlin
3433695 Capital 3010492 New York 1683713 Berlin
Madrid Stockholm year_1990
Population 17800000 3382169
Madrid
2938723
Stockholm
1942362
year_2000
Importing Data into R
read.xls() > install.packages("gdata") > library(gdata)
Capital New York Berlin Madrid Stockholm
year_1990
Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm
year_2000
> read.xls("cities.xls") 1 2 3 4
Capital Population New York 16044000 Berlin 3433695 Madrid 3010492 Stockholm 1683713
> read.xls("cities.xls", sheet = "year_2000") 1 2 3 4
Capital Population New York 17800000 Berlin 3382169 Madrid 2938723 Stockholm 1942362
Population 17800000 3382169 2938723 1942362
IMPORTING DATA INTO R
Let’s practice!
IMPORTING DATA INTO R
XLConnect
Importing Data into R
Importing Data into R
XLConnect ●
Martin Studer
●
Work with Excel through R
●
Bridge between Excel and R
●
Rich and easy-to-use functionality
●
Installation can be frustrating
●
XLS and XLSX
Importing Data into R
cities.xlsx Capital
Population
New York
16044000
Berlin
3433695 Capital 3010492 New York 1683713 Berlin
Madrid Stockholm year_1990
Population 17800000 3382169
Madrid
2938723
Stockholm
1942362
year_2000 > install.packages("XLConnect") # Install Java first! > library(XLConnect)
Importing Data into R
loadWorkbook() > book str(book) Formal class 'workbook' [package "XLConnect"]
with 2 slots ..@ filename: chr "cities.xlsx" ..@ jobj : ...
Importing Data into R
getSheets() > getSheets(book) [1] "year_1990" "year_2000"
Capital New York Berlin Madrid Stockholm
year_1990
Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm
year_2000
> library(readxl) > excel_sheets("cities.xlsx") [1] "year_1990" "year_2000"
Population 17800000 3382169 2938723 1942362
Importing Data into R
readWorksheet()
Capital New York Berlin Madrid Stockholm
year_1990
Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm
year_2000
Population 17800000 3382169 2938723 1942362
Importing Data into R
readWorksheet() > readWorksheet(book, sheet = "year_2000")
1 2 3 4
Capital Population New York 17800000 Berlin 3382169 Madrid 2938723 Stockholm 1942362
Capital New York Berlin Madrid Stockholm
year_1990
Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm
year_2000
Population 17800000 3382169 2938723 1942362
Importing Data into R
readWorksheet() Capital
Population
New York
17800000
Berlin
3382169
row 3
Madrid
2938723
row 4
Stockholm
1942362
year_2000
col 2
> readWorksheet(book, sheet = "year_2000",
startRow = 3, endRow = 4,
startCol = 2, header = FALSE) Col1 1 3382169 2 2938723
Importing Data into R
createSheet() Capital
Population
Capital
Population
New York
16044000
New York
17800000
Berlin
3433695
Berlin
3382169
Madrid
3010492
Madrid
2938723
Stockholm
1683713
Stockholm
1942362
year_1990
year_2000
> book createSheet(book, name = "year_2010")
year_2010
Importing Data into R
writeWorksheet() > data_to_add data_to_add 1 2 3 4
Capital Population New York 8191900 Berlin 3460725 Madrid 3273000 Stockholm 1372565
> writeWorksheet(book, data_to_add, sheet = "year_2010")
Importing Data into R
writeWorksheet() Capital
Population
Capital
Population
New York
16044000
New York
17800000
Berlin
3433695
Berlin
3382169
Madrid
3010492
Madrid
2938723
Stockholm
1683713
Stockholm
1942362
year_1990
year_2000
year_2010
> writeWorksheet(book, data_to_add, sheet = "year_2010")
Importing Data into R
writeWorksheet() Capital
Population
Capital
Population
Capital
Population
New York
16044000
New York
17800000
New York
8191900
Berlin
3433695
Berlin
3382169
Berlin
3460725
Madrid
3010492
Madrid
2938723
Madrid
3273000
Stockholm
1683713
Stockholm
1942362
Stockholm
1372565
year_1990
year_2000
year_2010
> writeWorksheet(book, data_to_add, sheet = "year_2010")
Importing Data into R
saveWorkbook() Explicitly save workbook as Excel File > saveWorkbook(book, "cities_extended.xlsx")
Capital
Population
cities_extended.xlsx
New York
16044000 Capital Population Berlin 3433695 New York 17800000 Capital Madrid 3010492 Berlin 3382169 New York Stockholm 1683713 Madrid 2938723 Berlin year_1990 Stockholm 1942362 Madrid year_2000 Stockholm year_2010
Population 8191900 3460725 3273000 1372565
Importing Data into R
Wrap-up ●
Basic operations
●
Reproducibility is the key!
●
More functionality ●
Removing sheets
●
Renaming sheets
●
Styling sheets
●
Manipulating Formulas
IMPORTING DATA INTO R
Let’s practice!