##1. Reading Data - Advantages of readr
package
###1.1 Motivation All the stuff that's created here is as a part of Kaggle's Talking-Data competition and my efforts towards learning R.
###1.2 Description
In read-data.R input data is read using base::read.csv()
command
app_events <- read.csv("app_events.csv/app_events.csv",
colClasses = c("integer", "character", "integer", "integer"))
Whereas, in read-data_readr.R data is read using readr::read_csv()
command
library(readr) # load `readr` package
app_events <- read_csv("app_events.csv/app_events.csv", col_types = "icii")
###1.3 Observations
After using the readr
package the code looks neat & simple as the number of lines in the code are reduced from 22 to 11. And operations such as options(stringsAsFactors = FASLE)
& converting data.frame
s into tibble
s using tbl_df()
command are not required in the latter script. The width of the code is also reduced due to efficient usage of col_types
argument in readr
. One of the main advantage of using readr
package is that it is a lot faster.
###1.4 Comments
Two points to note here are the differences between base::read.csv()
& readr::read_csv()
commands and colClasses
& col_types
arguments.
###1.5 Credits Thanks Kaggle for the motivation and adam-p's Cheatsheet for Markdown.