IMPORTING DATA INTO R

Report 7 Downloads 358 Views
IMPORTING DATA INTO R

Importing Excel Data readxl

Importing Data into R

Microso! Excel ●

Common data analysis tool



Many R packages to interact with Excel



readxl - Hadley Wickham

Importing Data into R

Typical Structure Excel Data Different sheets with tabular data Capital

Population

New York

16044000

Berlin

3433695 Capital 3010492 New York 1683713 Berlin

Madrid Stockholm year_1990

17800000 3382169

Madrid

2938723

Stockholm

1942362

year_2000 Source: Wikipedia

Population

Importing Data into R

readxl ●

excel_sheets()

list different sheets



read_excel()

actually import data into R

> install.packages("readxl") > library(readxl)

Importing Data into R

excel_sheets() > dir() [1] "cities.xlsx"

"the_rest_is_secret.txt"

Capital New York Berlin Madrid Stockholm

year_1990

Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm

year_2000

> excel_sheets("cities.xlsx") [1] "year_1990" "year_2000"

Population 17800000 3382169 2938723 1942362

Importing Data into R

read_excel() > read_excel("cities.xlsx")

1 2 3 4

Capital Population New York 16044000 Berlin 3433695 Madrid 3010492 Stockholm 1683713

> read_excel("cities.xlsx", sheet = 2) > read_excel("cities.xlsx", sheet = "year_2000") 1 2 3 4

Capital Population New York 17800000 Berlin 3382169 Madrid 2938723 Stockholm 1942362

Capital New York Berlin Madrid Stockholm

year_1990

Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm

year_2000

Population 17800000 3382169 2938723 1942362

Importing Data into R

read_excel() read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, skip = 0)

!

Importing Data into R

read_excel() - col_names read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, skip = 0)

col_names = FALSE: R assigns names itself col_names = character vector: manually specify

!

Importing Data into R

read_excel() - col_types !

read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, skip = 0)

Capital New York Berlin Madrid Stockholm

year_1990

Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm

year_2000

> pop_data str(pop_data) Classes 'tbl_df', 'tbl' and 'data.frame': 
 4 obs. of 2 variables: $ Capital : chr "New York" "Berlin" "Madrid" ... $ Population: chr "16044000" "3433695" "3010492" ...

numeric date blank

Population 17800000 3382169 2938723 1942362

Importing Data into R

read_excel() - col_types read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, skip = 0)

> read_excel("cities.xlsx", col_types = c("text", "blank")) 1 2 3 4

Capital New York Berlin Madrid Stockholm

!

Capital New York Berlin Madrid Stockholm

year_1990

Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm

year_2000

Population 17800000 3382169 2938723 1942362

Importing Data into R

read_excel() - skip !

read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, skip = 0)

year_1990

Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm

year_2000

> read_excel("cities.xlsx", col_names = c("Capital", "Population"), skip = 2) Capital Population 1 Berlin 3433695 2 Madrid 3010492 3 Stockholm 1683713

Capital New York Berlin Madrid Stockholm

n_max not (yet) available

Population 17800000 3382169 2938723 1942362

Importing Data into R

Wrap-up ●

excel_sheets()



read_excel()



Everything you need!



Fast



Same arguments as in readr package



Consistency

IMPORTING DATA INTO R

Let’s practice!

IMPORTING DATA INTO R

Importing Excel Data gdata

Importing Data into R

gdata ●

Gregory Warnes



Entire suite of tools for data manipulation



Supercharges basic R



read.xls()



Support for XLS



Support for XLSX with additional driver



No readxl::excel_sheets() equivalent

Importing Data into R

gdata read.table() > 15 arguments! XLS

Perl

CSV

read.csv()

R data frame



Elegant extension of utils package



Easy if familiar with utils



Extremely inefficient



readxl < v1.x

Importing Data into R

cities.xls Capital

Population

New York

16044000

Berlin

3433695 Capital 3010492 New York 1683713 Berlin

Madrid Stockholm year_1990

Population 17800000 3382169

Madrid

2938723

Stockholm

1942362

year_2000

Importing Data into R

read.xls() > install.packages("gdata") > library(gdata)

Capital New York Berlin Madrid Stockholm

year_1990

Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm

year_2000

> read.xls("cities.xls") 1 2 3 4

Capital Population New York 16044000 Berlin 3433695 Madrid 3010492 Stockholm 1683713

> read.xls("cities.xls", sheet = "year_2000") 1 2 3 4

Capital Population New York 17800000 Berlin 3382169 Madrid 2938723 Stockholm 1942362

Population 17800000 3382169 2938723 1942362

IMPORTING DATA INTO R

Let’s practice!

IMPORTING DATA INTO R

XLConnect

Importing Data into R

Importing Data into R

XLConnect ●

Martin Studer



Work with Excel through R



Bridge between Excel and R



Rich and easy-to-use functionality



Installation can be frustrating



XLS and XLSX

Importing Data into R

cities.xlsx Capital

Population

New York

16044000

Berlin

3433695 Capital 3010492 New York 1683713 Berlin

Madrid Stockholm year_1990

Population 17800000 3382169

Madrid

2938723

Stockholm

1942362

year_2000 > install.packages("XLConnect") # Install Java first! > library(XLConnect)

Importing Data into R

loadWorkbook() > book str(book) Formal class 'workbook' [package "XLConnect"] 
 with 2 slots ..@ filename: chr "cities.xlsx" ..@ jobj : ...

Importing Data into R

getSheets() > getSheets(book) [1] "year_1990" "year_2000"

Capital New York Berlin Madrid Stockholm

year_1990

Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm

year_2000

> library(readxl) > excel_sheets("cities.xlsx") [1] "year_1990" "year_2000"

Population 17800000 3382169 2938723 1942362

Importing Data into R

readWorksheet()

Capital New York Berlin Madrid Stockholm

year_1990

Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm

year_2000

Population 17800000 3382169 2938723 1942362

Importing Data into R

readWorksheet() > readWorksheet(book, sheet = "year_2000")

1 2 3 4

Capital Population New York 17800000 Berlin 3382169 Madrid 2938723 Stockholm 1942362

Capital New York Berlin Madrid Stockholm

year_1990

Population 16044000 Capital 3433695 New York 3010492 Berlin 1683713 Madrid Stockholm

year_2000

Population 17800000 3382169 2938723 1942362

Importing Data into R

readWorksheet() Capital

Population

New York

17800000

Berlin

3382169

row 3

Madrid

2938723

row 4

Stockholm

1942362

year_2000

col 2

> readWorksheet(book, sheet = "year_2000", 
 startRow = 3, endRow = 4, 
 startCol = 2, header = FALSE) Col1 1 3382169 2 2938723

Importing Data into R

createSheet() Capital

Population

Capital

Population

New York

16044000

New York

17800000

Berlin

3433695

Berlin

3382169

Madrid

3010492

Madrid

2938723

Stockholm

1683713

Stockholm

1942362

year_1990

year_2000

> book createSheet(book, name = "year_2010")

year_2010

Importing Data into R

writeWorksheet() > data_to_add data_to_add 1 2 3 4

Capital Population New York 8191900 Berlin 3460725 Madrid 3273000 Stockholm 1372565

> writeWorksheet(book, data_to_add, sheet = "year_2010")

Importing Data into R

writeWorksheet() Capital

Population

Capital

Population

New York

16044000

New York

17800000

Berlin

3433695

Berlin

3382169

Madrid

3010492

Madrid

2938723

Stockholm

1683713

Stockholm

1942362

year_1990

year_2000

year_2010

> writeWorksheet(book, data_to_add, sheet = "year_2010")

Importing Data into R

writeWorksheet() Capital

Population

Capital

Population

Capital

Population

New York

16044000

New York

17800000

New York

8191900

Berlin

3433695

Berlin

3382169

Berlin

3460725

Madrid

3010492

Madrid

2938723

Madrid

3273000

Stockholm

1683713

Stockholm

1942362

Stockholm

1372565

year_1990

year_2000

year_2010

> writeWorksheet(book, data_to_add, sheet = "year_2010")

Importing Data into R

saveWorkbook() Explicitly save workbook as Excel File > saveWorkbook(book, "cities_extended.xlsx")

Capital

Population

cities_extended.xlsx

New York

16044000 Capital Population Berlin 3433695 New York 17800000 Capital Madrid 3010492 Berlin 3382169 New York Stockholm 1683713 Madrid 2938723 Berlin year_1990 Stockholm 1942362 Madrid year_2000 Stockholm year_2010

Population 8191900 3460725 3273000 1372565

Importing Data into R

Wrap-up ●

Basic operations



Reproducibility is the key!



More functionality ●

Removing sheets



Renaming sheets



Styling sheets



Manipulating Formulas

IMPORTING DATA INTO R

Let’s practice!