Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft for text description of the data #69

Merged
merged 16 commits into from
May 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,11 @@ Imports:
rlang
Suggests:
dbplyr,
glue,
knitr,
rmarkdown,
spelling,
stringr,
testthat (>= 3.0.0),
tibble
VignetteBuilder:
Expand Down
85 changes: 85 additions & 0 deletions R/as-markdown.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#' Convert the register data sources
#'
#' @param caption Caption to add to the table.
#'
#' @return A character vector as a Markdown table.
#' @keywords internal
#'
registers_as_md_table <- function(caption = NULL) {
rlang::check_installed("glue")
rlang::check_installed("knitr")
Comment on lines +9 to +10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you use a function from a package that you set as "suggests" as a dependency, you need to include a check function to inform the user to install these packages if they are not installed. So that's what these do (when you use use_package("packagename", "suggests"), it tells you exactly what to do)


variable_description |>
dplyr::select(
.data$register_name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mentioned this already, but the .data$ is used to masked the variable so that CRAN doesn't warn of "undeclared variables". Since we declare this placeholder variable ".data" already.

.data$register_abbrev,
.data$start_year,
.data$end_year
) |>
dplyr::mutate(
end_year = dplyr::if_else(is.na(.data$end_year), "present", as.character(.data$end_year)),
years = glue::glue("{start_year} - {end_year}"),
register_abbrev = glue::glue("`{register_abbrev}`")
) |>
dplyr::distinct() |>
dplyr::select(
"Register" = .data$register_name,
"Abbreviation" = .data$register_abbrev,
"Years" = .data$years
) |>
knitr::kable(caption = caption)
}

#' Convert the register name into text to use in a Markdown header.
#'
#' @param register The abbreviation of the register name.
#'
#' @return A character vector.
#' @keywords internal
#'
register_as_md_header <- function(register) {
rlang::check_installed("glue")

variable_description |>
dplyr::distinct(.data$register_name, .data$register_abbrev) |>
dplyr::filter(.data$register_abbrev == register) |>
glue::glue_data(
"`{register_abbrev}`: {register_name}"
)
}

#' Convert the fake register data into a Markdown table.
#'
#' @param register The abbreviation of the register name.
#' @param caption A caption to add to the table.
#'
#' @return A character vector as a Markdown table.
#' @keywords internal
#'
register_data_as_md_table <- function(register, caption = NULL) {
rlang::check_installed("glue")
rlang::check_installed("knitr")

register_data[[register]] |>
head(4) |>
knitr::kable(caption = caption)
}

#' Converts the variables for a register into a Markdown table.
#'
#' @inheritParams register_data_as_md_table
#'
#' @return A character vector as a Markdown table.
#' @keywords internal
#'
variables_as_md_table <- function(register, caption = NULL) {
rlang::check_installed("glue")
rlang::check_installed("knitr")
rlang::check_installed("stringr")

variable_description |>
dplyr::filter(.data$register_abbrev == register) |>
dplyr::select(.data$variable_name, .data$english_description) |>
dplyr::mutate(english_description = stringr::str_to_sentence(.data$english_description)) |>
knitr::kable(caption = caption)
}
83 changes: 83 additions & 0 deletions vignettes/data-sources.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: "Data sources"
output: rmarkdown::html_vignette
bibliography: references.bib
csl: vancouver.csl
vignette: >
%\VignetteIndexEntry{Data sources}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
echo = FALSE,
results = "asis",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tells knitr to treat all the output as plain text rather than as code output text

collapse = TRUE,
comment = "#>"
)
library(tidyverse)
library(osdc)
```

This document describes the sources of data needed by the OSDC algorithm
and gives a brief overview of each of these sources and how they might
look like. The algorithm uses these Danish registers as input data
sources:

```{r, results='asis'}
osdc:::registers_as_md_table("Danish registers used in the OSDC algorithm.")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The triple-colons is because this is an internal function/object or what is going on here?
Also, a note to my future self: you need to build the package in order to access internal objects

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haah yes! Or use Ctrl-Shift-L to load the package. And yea, ::: accesses all internal objects in a package, neat trick!

```

In a future revision, the algorithm can also use the Danish Medical
Birth Register to extend the period of time of valid inclusions further
back in time compared to what is possible using obstetric codes from the
National Patient Register.

## Expected data structure

This section describes how the data sources are expected to look like
when they are input into the OSDC algorithm. We try to mimic as much as
possible how the raw data looks like within Denmark Statistics. So since
registers are often stored on a per year basis, we don't expect a year
variable in the data itself. If you've processed the data so that it has
a year variable, you will likely need to do a split-apply-combine
approach when using the osdc package. We internally convert all variable
names to lower case, and so we present them here in lower case, but case
may vary between data sources (and even between years in the same data
source) in real data.

A small note about the National Patient Register. It contains several
tables and types of data. The algorithm uses only hospital diagnosis
data that contained in four registers, which are a pair of two related
registers used before (LPR2) and after (LPR3) 2019. So the LPR2 to LPR3
equivalents are `lpr_adm` to `kontakter` and `lpr_diag` to `diagnoser`.
Most of the variables have equivalents as well, except that while
`c_spec` is the LPR2 equivalent of `hovedspeciale_ans` in LPR3, the
specialty values in `hovedspeciale_ans` are coded as literal specialty
names and are different from the padded integer codes that `c_spec`
contains.

On Statistics Denmark, these tables are provided as a mix of separate
files for each calendar year prior to 2019 (in LPR2 format) and a single
file containing all the data from 2019 onward (LPR3 format). The two
tables can be joined with either the `recnum` variable (LPR2 data) or
the `dw_ek_kontakt` variable (LPR3 data).

```{r}
for (register in osdc:::get_register_abbrev()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I almost always suggest not to use for loops, this is one of those cases that you need to, since these functions are used to create Markdown text, and it doesn't really work in a "functional" way.

print(glue::glue("### {osdc:::register_as_md_header(register)}"))

osdc:::variables_as_md_table(
register,
caption = glue::glue("Variables and their descriptions within the `{register}` register.")
) |>
print()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to print because only the last thing in a for loop is output, but we want all these things to output.


osdc:::register_data_as_md_table(
register,
caption = glue::glue("Simulated example of what the data looks like for the `{register}` register.")
) |>
print()
}
```
Loading