rev

unhcRverse · Nov 8, 2023 · 674e641 · 674e641
1 parent 74d6325
commit 674e641
Show file tree

Hide file tree

Showing 72 changed files with 902 additions and 813 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -21,20 +21,38 @@ options(scipen = 999)
 [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](code_of_conduct.md)
 [![R-CMD-check](https://github.com/impact-initiatives/cleaningtools/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/impact-initiatives/cleaningtools/actions/workflows/R-CMD-check.yaml)
 [![codecov](https://codecov.io/gh/impact-initiatives/cleaningtools/branch/master/graph/badge.svg?token=SOH3NGXQDU)](https://codecov.io/gh/impact-initiatives/cleaningtools)
-
 <!-- badges: end -->
 
+The `{cleaningtools}` package focuses on survey data cleaning process. It allow to have a fully documented and reproducible cleaning process, based on the generation of a standardized `cleaning and deletion log`. With such type of process, Quality Assurance and Auditing can be easily performed. 
+
+This tool support the implementation of IMPACT Initiatives / REACH guidance: [Data Cleaning Guidelines for Structured Data](https://www.reachresourcecentre.info/wp-content/uploads/2022/05/IMPACT_Data-Cleaning-Guidelines_FINAL_To-share-11.pdf) & [Data Cleaning Minimum Standards Checklist](https://www.reachresourcecentre.info/wp-content/uploads/2020/03/IMPACT_Memo_Data-Cleaning-Min-Standards-Checklist_28012020-1.pdf).
 
+The workflow supported by the tool includes:
 
-The `cleaningtools` package focuses on cleaning, and has three components: 
+ 1. Get your raw data and your form from your Kobo/ODK/ONA server.
+
+ 2. Define a __list of logical checks__ based on the specific content of your form. This is basically an excel spreadsheet defining checks describing incompatible responses (`check_id`,	`description`,	`check_to_perform`,	`columns_to_clean`) - such as "_primary_livelihood is rented but expenses less than 500000_" or "_access water and tank emptied_".
+
+ 3. Pipe a list of __systematic checks__ functions to apply on the data (_outliers, shortest path, personally identifiable information, duration..._), including the logical checks previously defined - each of the check will produce a specific log.
+
+ 4. Assemble and export the __`cleaning log`__ together in a dedicated excel spreadsheet (`create_xlsx_cleaning_log()`) so that the person responsible for the cleaning can manually make the decision on the cleaning action to perform among the following values:
 
-**1. Check**, which includes a set of functions that flag values, such as check_outliers and check_logical.  
+|value|Definition|
+|-----|----------|
+|`change_response`|Change the response to new.value|
+|`blank_response`|Remove and NA the response|
+|`remove_survey`|Delete the survey|
+|`no_action_value`|No action to take|
 
-**2. Create**, which includes a set of functions to create different items for use in cleaning, such as the cleaning log from the checks, clean data, and enumerator performance.  
+ 5. Apply the manually review `cleaning log` on the raw data to obtain the __cleaned data__, aka `checked_dataset` 
+
+ 6. Then __review__ how the cleaning was applied through dedicated report `review_cleaning()` , `review_the_others_log`, `review_sf` for the sampling frame 
+
+
+Please check the package vignette tuto to review the content with more details. 
 
-**3. Review**, which includes a set of functions to review the cleaning.
 
-## Installation
+## Installation & Usage
 
 You can install the development version from [GitHub](https://github.com/) with:
 
@@ -43,10 +61,24 @@ You can install the development version from [GitHub](https://github.com/) with:
 devtools::install_github("impact-initiatives/cleaningtools")
 ```
 
+The package comes with a parameterised report template to ease and speed-up the full process. 
+
+Once users have a good understanding of the process above, create an Rstudio projects, install the package, download your data and your form within a dedicated sub-folder for instance `data-raw`, create an excel file to add your `logical checks`, add if any the file defining your `sampling plan`.
+
+Then create a notebook using the `clean` notebook template included in the package and start documenting all the parameters. 
+
+Once done you can run each of the code chunk one after the other. After the first chapter, you should have a `cleaning log` file created within your the same `data-raw` folder. Open it and manually set up the cleaning actions for each of the checks.
+
+Run then the last few chunks to apply the log and review the results... 
+
+Et Voila, you should have then the `cleaned_data` in your  `data-raw` folder
+
+## Current Limitation
 
+The package assumes that the survey data is a single frame, it does not work out of the box with datalist, aka survey dataset that have more than one dataframe
 
 ## Code of Conduct
 
-Please note that the cleaningtools project is released with a [Contributor Code of Conduct](https://impact-initiatives.github.io/cleaningtools/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
+Please note that the {cleaningtools} project is released with a [Contributor Code of Conduct](https://impact-initiatives.github.io/cleaningtools/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms. For developpers, check the `dev/function_documentation.Rmd` notebook created with [{fusen}](https://thinkr-open.github.io/fusen/index.html)
 
 
diff --git a/README.md b/README.md
@@ -7,22 +7,59 @@
 Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](code_of_conduct.md)
 [![R-CMD-check](https://github.com/impact-initiatives/cleaningtools/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/impact-initiatives/cleaningtools/actions/workflows/R-CMD-check.yaml)
 [![codecov](https://codecov.io/gh/impact-initiatives/cleaningtools/branch/master/graph/badge.svg?token=SOH3NGXQDU)](https://codecov.io/gh/impact-initiatives/cleaningtools)
-
 <!-- badges: end -->
 
-The `cleaningtools` package focuses on cleaning, and has three
-components:
+The `{cleaningtools}` package focuses on survey data cleaning process.
+It allow to have a fully documented and reproducible cleaning process,
+based on the generation of a standardized `cleaning and deletion log`.
+With such type of process, Quality Assurance and Auditing can be easily
+performed.
+
+This tool support the implementation of IMPACT Initiatives / REACH
+guidance: [Data Cleaning Guidelines for Structured
+Data](https://www.reachresourcecentre.info/wp-content/uploads/2022/05/IMPACT_Data-Cleaning-Guidelines_FINAL_To-share-11.pdf)
+& [Data Cleaning Minimum Standards
+Checklist](https://www.reachresourcecentre.info/wp-content/uploads/2020/03/IMPACT_Memo_Data-Cleaning-Min-Standards-Checklist_28012020-1.pdf).
+
+The workflow supported by the tool includes:
+
+1.  Get your raw data and your form from your Kobo/ODK/ONA server.
+
+2.  Define a **list of logical checks** based on the specific content of
+    your form. This is basically an excel spreadsheet defining checks
+    describing incompatible responses (`check_id`, `description`,
+    `check_to_perform`, `columns_to_clean`) - such as
+    “*primary_livelihood is rented but expenses less than 500000*” or
+    “*access water and tank emptied*”.
+
+3.  Pipe a list of **systematic checks** to apply on the data
+    (*outliers, shortest path, personally identifiable information,
+    duration…*), including the logical checks previously defined - each
+    of the check will produce a specific log.
+
+4.  Assemble and export the **`cleaning log`** together in a dedicated
+    excel spreadsheet (`create_xlsx_cleaning_log()`) so that the person
+    responsible for the cleaning can manually make the decision on the
+    cleaning action to perform among the following values:
+
+| value             | Definition                       |
+|-------------------|----------------------------------|
+| `change_response` | Change the response to new.value |
+| `blank_response`  | Remove and NA the response       |
+| `remove_survey`   | Delete the survey                |
+| `no_action_value` | No action to take                |
 
-**1. Check**, which includes a set of functions that flag values, such
-as check_outliers and check_logical.
+5.  Apply the manually review `cleaning log` on the raw data to obtain
+    the **cleaned data**, aka `checked_dataset`
 
-**2. Create**, which includes a set of functions to create different
-items for use in cleaning, such as the cleaning log from the checks,
-clean data, and enumerator performance.
+6.  Then **review** how the cleaning was applied through dedicated
+    report `review_cleaning()` , `review_the_others_log`, `review_sf`
+    for the sampling frame
 
-**3. Review**, which includes a set of functions to review the cleaning.
+Please check the package vignette tuto to review the content with more
+details.
 
-## Installation
+## Installation & Usage
 
 You can install the development version from
 [GitHub](https://github.com/) with:
@@ -32,9 +69,39 @@ You can install the development version from
 devtools::install_github("impact-initiatives/cleaningtools")
 ```
 
+The package comes with a parameterised report template to ease and
+speed-up the full process.
+
+Once users have a good understanding of the process above, create an
+Rstudio projects, install the package, download your data and your form
+within a dedicated sub-folder for instance `data-raw`, create an excel
+file to add your `logical checks`, add if any the file defining your
+`sampling plan`.
+
+Then create a notebook using the `clean` notebook template included in
+the package and start documenting all the parameters.
+
+Once done you can run each of the code chunck one after the other. After
+the first chapter, you should have the `cleaning log` file created
+within your the same `data-raw` folder. Open it and manually set up the
+cleaning action for each of the check.
+
+Run then the last few chunks to apply the log and review the results…
+
+Et Voila, you should have then the `cleaned_data` in your `data-raw`
+folder
+
+## Current Limitation
+
+The package assumes that the survey data is a single frame, it does not
+work out of the box with datalist, aka survey dataset that have more
+than one dataframe
+
 ## Code of Conduct
 
-Please note that the cleaningtools project is released with a
+Please note that the {cleaningtools} project is released with a
 [Contributor Code of
 Conduct](https://impact-initiatives.github.io/cleaningtools/CODE_OF_CONDUCT.html).
-By contributing to this project, you agree to abide by its terms.
+By contributing to this project, you agree to abide by its terms. For
+developpers, check the `dev/function_documentation.Rmd` notebook created
+with [{fusen}](https://thinkr-open.github.io/fusen/index.html)
diff --git a/data-raw/logical_check_list.xlsx b/data-raw/logical_check_list.xlsx
diff --git a/data-raw/review.xlsx b/data-raw/review.xlsx
diff --git a/docs/404.html b/docs/404.html
diff --git a/docs/CODE_OF_CONDUCT.html b/docs/CODE_OF_CONDUCT.html
diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html
diff --git a/docs/LICENSE.html b/docs/LICENSE.html