diff --git a/NEWS.md b/NEWS.md index 12c7cd4..389b8c4 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,7 @@ +# blueprintr 0.2.5.9000 (dev version) +* Updated the vignettes +* Added a new folder under `inst` to add metadata to the vignettes + # blueprintr 0.2.5 * Add capability to embed custom messages to check results, using `check.errors` attribute in returned logical value * Refactor side-effect messages from built-in checks to `check.errors` diff --git a/inst/mapping/mtcars_item_mapping.csv b/inst/mapping/mtcars_item_mapping.csv new file mode 100644 index 0000000..6732afb --- /dev/null +++ b/inst/mapping/mtcars_item_mapping.csv @@ -0,0 +1,13 @@ +"name_1","description_1","coding_1","panel","homogenized_name","homogenized_coding","homogenized_description" +"rn","Name of car","NA","MTCARS_PANEL","name","NA","Name of Car" +"mpg","Miles per gallon","NA","MTCARS_PANEL","mpg","NA","Miles per gallon" +"cyl","Number of cylinders","NA","MTCARS_PANEL","cyl","NA","Number of cylinders" +"disp","Displacement","NA","MTCARS_PANEL","disp","NA","Displacement" +"hp","Gross horsepower","NA","MTCARS_PANEL","hp","NA","Gross horsepower" +"drat","Rear axle ratio","NA","MTCARS_PANEL","drat","NA","Rear axle ratio" +"wt","Weight","NA","MTCARS_PANEL","wt","NA","Weight" +"qsec","Quarter mile time","NA","MTCARS_PANEL","qsec","NA","Quarter mile time" +"vs","Engine","coding(code(""1"",""1""), code(""0"", ""0""))","MTCARS_PANEL","vs","coding(code(""1"",""straight""), code(""0"", ""v-shaped""))","Engine" +"am","Transmission","coding(code(""1"",""1""), code(""0"", ""0""))","MTCARS_PANEL","am","coding(code(""1"",""manual""), code(""0"", ""automatic""))","Transmission" +"gear","Number of forward gears","NA","MTCARS_PANEL","gear","NA","Number of forward gears" +"carb","Number of carburetors","NA","MTCARS_PANEL","carb","NA","Number of carburetors" diff --git a/inst/project/blueprints/example/homogenized.csv b/inst/project/blueprints/example/homogenized.csv new file mode 100644 index 0000000..4eb0be9 --- /dev/null +++ b/inst/project/blueprints/example/homogenized.csv @@ -0,0 +1,14 @@ +"name","type","description","coding" +"name","character","Name of Car", +"mpg","double","Miles per gallon", +"cyl","double","Number of cylinders", +"disp","double","Displacement", +"hp","double","Gross horsepower", +"drat","double","Rear axle ratio", +"wt","double","Weight", +"qsec","double","Quarter mile time", +"vs","character","Engine","coding(code(""straight"",""1""), code(""v-shaped"",""0""))" +"am","character","Transmission","coding(code(""manual"",""1""), code(""automatic"",""0""))" +"gear","double","Number of forward gears", +"carb","double","Number of carburetors", +"wave","character",, diff --git a/vignettes/blueprintr.Rmd b/vignettes/blueprintr.Rmd index 7696bd4..d27b31f 100644 --- a/vignettes/blueprintr.Rmd +++ b/vignettes/blueprintr.Rmd @@ -1,5 +1,5 @@ --- -title: "A Walkthrough of blueprintr" +title: "Introduction to blueprintr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{A Walkthrough of blueprintr} @@ -40,88 +40,112 @@ cache_location <- tempdir() drake::clean(cache = drake::drake_cache(cache_location)) ``` -blueprintr is a companion to [drake](https://github.com/ropensci/drake) that focuses on documenting and testing tabular data. Whereas drake manages the workflow execution, blueprintr defines a collection of steps that need to be run in a drake workflow. +`blueprintr` is a framework for managing your data assets in a reproducible fashion. While it uses [drake](https://github.com/ropensci/drake) or [targets](https://cran.r-project.org/web/packages/targets/), it adds automated steps for tabular dataset documentation and testing. This allows researchers to create a replicable framework to prevent programming issues from affecting analysis results. -# Basic Use +## Installation -The first, and recommended, step is to attach blueprintr to your R session with `library()`. +```{r setup, results= FALSE} +# install.packages("remotes") +# remotes::install_github("nyuglobalties/blueprintr") -```{r setup} library(blueprintr) ``` -In a [drake project](https://books.ropensci.org/drake/projects.html), all packages that you want attached are declared in a `"packages.R"` file. This `library(blueprintr)` command should go there. +## Designed Use of blueprintr +`blueprintr` provides your data with guardrails typically found in software engineering workflows. +This allows you to test and document before deploying to production. -blueprintr is built around "blueprints." Our first blueprint will be a blueprint for `mtcars`: +The top level of the `blueprintr` workflow is a "blueprints" directory, consisting of `.R` and `.csv` files. -```{r} -blueprint( - "mtcars_dat", - description = "The famous mtcars dataset", - command = { - mtcars - } -) -``` +### About blueprints +Each blueprint has two components to it: +* Data Construction Spec, usually a `.R` file that instructs drake or targets on how to build a specific dataset. +* Metadata, usually a `.csv` file that incorporates any mapping files and checks that need to be done on the dataset. -All blueprints have +In order to create a blueprint, we use the `blueprint` function. This function takes three arguments: name (the name of your generated dataset), description (a description of your dataset), command (any functions that need to be applied in order to build the dataset). -* A name (the first argument) for the _target_ dataset. -* A description or brief summary of what the target is. Can be `NULL`. -* A command, which is a quoted statement that has the code for building this target. -* A metadata location, which is a path to where the target metadata is saved. +A project may need only a few blueprints, but more likely you'll need nested blueprints to transform the data. -