During data cleaning for a survey, analysts typically must create countless scripts to inspect and clean every variable in each survey data files. To a large degree, this involves:
- Creating an empty cleaning program for a given scope of cleaning (per data file, survey module, analyst area of expertise, etc.)
- Copying information for each variable from Survey Solutions Designer (e.g., enablement conditions, validation conditions, etc from Survey Solutions Designer)
- Pasting information into the empty Stata file in a consistent, readable format
- Transforming copied information following known rules/procedures (e.g., translate Survey Solutionsā expressions into Stata, write the same form of check(s) for variables of a given question type, etc.)
- Repeating (ad naseaum) for each desired cleaning program
Convinced that computers are better at copy-paste-transform operations
than humans, {cleanstart}
provides analysts an interactive graphical
for creating a template cleaning program. From there, the analyst can
dedicate their time, skill, and judgment to a task that computers arenāt
(yet) good at doing: cleaning data.
To get started, the analyst simply:
- Installs the packageāa one-time operation
- Downloads a JSON file containing questionnaire metadataāa one-time-per-survey operation.
- Opens the appās graphical interface
- Selects the range of variables to clean (e.g., from
s02q01
tos02q31a
) - Provides optional information on desired customizations (e.g., replace
@rowcode
withmembers__id
, identify āother (specify)ā variables as those ending in_os
, etc.) - Provides a file name for the Stata .do file
- Downloads the Stata .do file template
Since susometa is not yet available on CRAN, it can be installed from GitHub as follows:
if (!require("pak")) install.packages("pak")
pak::pak("lsms-worldbank/cleanstart")
To open the appās graphical interface, simply run this command in the R console:
library(cleanstart)
run_app()