Skip to content

puneeth019/TalkingData_Puneeth

Repository files navigation

##1. Reading Data - Advantages of readr package

###1.1 Motivation All the stuff that's created here is as a part of Kaggle's Talking-Data competition and my efforts towards learning R.

###1.2 Description In read-data.R input data is read using base::read.csv() command

app_events <- read.csv("app_events.csv/app_events.csv",
                      colClasses = c("integer", "character", "integer", "integer"))

Whereas, in read-data_readr.R data is read using readr::read_csv() command

library(readr)  # load `readr` package
app_events <- read_csv("app_events.csv/app_events.csv", col_types = "icii")

###1.3 Observations After using the readr package the code looks neat & simple as the number of lines in the code are reduced from 22 to 11. And operations such as options(stringsAsFactors = FASLE) & converting data.frames into tibbles using tbl_df() command are not required in the latter script. The width of the code is also reduced due to efficient usage of col_types argument in readr. One of the main advantage of using readr package is that it is a lot faster.

###1.4 Comments Two points to note here are the differences between base::read.csv() & readr::read_csv() commands and colClasses & col_types arguments.

###1.5 Credits Thanks Kaggle for the motivation and adam-p's Cheatsheet for Markdown.