blueprintr is a companion to targets that adds automated steps for tabular dataset documentation and testing. Designed for social science research projects, this package creates a framework to build trust in your data and to prevent programming issues from affecting your analysis results.
Define blueprints of your data using blueprint()
. Blueprints combine
dataset creation code with some extra metadata about the output tabular
dataset, including the name and description of the data. By convention,
these blueprint calls are stored in scripts in the “blueprints” folder
of your project:
# blueprints/blueprint1.R
blueprint(
"blueprint1",
description = "My first blueprint",
command = {
# Put all code related to building this dataset here
mtcars
}
)
Refer to other datasets using .TARGET()
to guarantee that parent
datasets are also tested and documented. Run checks on the dataset
entirely with the checks
parameter and define variable tests in the
metadata files. You can store these tests in the conventional “R”
folder:
# R/checks.R
no_missing_cyl <- function(df) {
all(!is.na(df$cyl))
}
# blueprints/blueprint2.R
blueprint2 <- blueprint(
"blueprint2",
description = "My second blueprint that depends on another",
checks = check_list(
no_missing_cyl()
),
command = .TARGET("blueprint1") %>%
filter(cyl == 4)
)
Once all blueprints are defined, add them to your _targets.R
pipeline
file:
# _targets.R
library(targts)
library(blueprintr)
list(
# ... Other steps ...
tar_blueprints()
)
targets handles the code execution based on the steps generated by blueprintr.
Stable versions of blueprintr can be installed from Global TIES’ r-universe:
install.packages("blueprintr", repos = "https://nyuglobalties.r-universe.dev")
Development versions can be installed from this repository:
install.packages("remotes")
remotes::install_github("nyuglobalties/blueprintr")
Please note that the ‘blueprintr’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.