As part of Data Analysis course at CodeClan, I was asked to do at least 2 tasks out of 6 for the dirty data project (task 4 was mandatory). The aim of the project was to practice data cleaning skills since it is known that
80% of time in data science and analysis is spent on data cleaning.
All the project was made in Rstudio. Each task was supposed to have 4 different folders:
raw_data
data_cleaning_scripts
clean_data
documentation_and_analysis
Analysis for each task can be found in analysis folder with comments. A cleaning script is in a separate folder as potentially, it can be run on raw data with similar structures and contents.
Analysis Folder | Task |
---|---|
Task 1 | Decathalon Results |
Task 2 | Cake Ingredients |
Task 3 | Seabird Sightings |
Task 4 | Halloween Candy Survey |
Task 5 | Right Wing Authoritarianism Survey |
Task 6 | Dog Survey |
Package | Version |
---|---|
assertr | 2.7 |
janitor | 2.0.1 |
tidyverse | 1.3.0 |
readxl | 1.3.1 |
plyr | 1.8.6 |
stringr | 1.4.0 |