-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft for text description of the data #69
Changes from all commits
69276a0
f9ccbb9
71d1e78
7d2dfd9
89f78d5
c878066
e58f1fe
71eab8f
5a16c74
66a232e
f2f49d4
45850be
1bc9931
53108a9
cf30ea7
d069d3a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
#' Convert the register data sources | ||
#' | ||
#' @param caption Caption to add to the table. | ||
#' | ||
#' @return A character vector as a Markdown table. | ||
#' @keywords internal | ||
#' | ||
registers_as_md_table <- function(caption = NULL) { | ||
rlang::check_installed("glue") | ||
rlang::check_installed("knitr") | ||
|
||
variable_description |> | ||
dplyr::select( | ||
.data$register_name, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I mentioned this already, but the |
||
.data$register_abbrev, | ||
.data$start_year, | ||
.data$end_year | ||
) |> | ||
dplyr::mutate( | ||
end_year = dplyr::if_else(is.na(.data$end_year), "present", as.character(.data$end_year)), | ||
years = glue::glue("{start_year} - {end_year}"), | ||
register_abbrev = glue::glue("`{register_abbrev}`") | ||
) |> | ||
dplyr::distinct() |> | ||
dplyr::select( | ||
"Register" = .data$register_name, | ||
"Abbreviation" = .data$register_abbrev, | ||
"Years" = .data$years | ||
) |> | ||
knitr::kable(caption = caption) | ||
} | ||
|
||
#' Convert the register name into text to use in a Markdown header. | ||
#' | ||
#' @param register The abbreviation of the register name. | ||
#' | ||
#' @return A character vector. | ||
#' @keywords internal | ||
#' | ||
register_as_md_header <- function(register) { | ||
rlang::check_installed("glue") | ||
|
||
variable_description |> | ||
dplyr::distinct(.data$register_name, .data$register_abbrev) |> | ||
dplyr::filter(.data$register_abbrev == register) |> | ||
glue::glue_data( | ||
"`{register_abbrev}`: {register_name}" | ||
) | ||
} | ||
|
||
#' Convert the fake register data into a Markdown table. | ||
#' | ||
#' @param register The abbreviation of the register name. | ||
#' @param caption A caption to add to the table. | ||
#' | ||
#' @return A character vector as a Markdown table. | ||
#' @keywords internal | ||
#' | ||
register_data_as_md_table <- function(register, caption = NULL) { | ||
rlang::check_installed("glue") | ||
rlang::check_installed("knitr") | ||
|
||
register_data[[register]] |> | ||
head(4) |> | ||
knitr::kable(caption = caption) | ||
} | ||
|
||
#' Converts the variables for a register into a Markdown table. | ||
#' | ||
#' @inheritParams register_data_as_md_table | ||
#' | ||
#' @return A character vector as a Markdown table. | ||
#' @keywords internal | ||
#' | ||
variables_as_md_table <- function(register, caption = NULL) { | ||
rlang::check_installed("glue") | ||
rlang::check_installed("knitr") | ||
rlang::check_installed("stringr") | ||
|
||
variable_description |> | ||
dplyr::filter(.data$register_abbrev == register) |> | ||
dplyr::select(.data$variable_name, .data$english_description) |> | ||
dplyr::mutate(english_description = stringr::str_to_sentence(.data$english_description)) |> | ||
knitr::kable(caption = caption) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
--- | ||
title: "Data sources" | ||
output: rmarkdown::html_vignette | ||
bibliography: references.bib | ||
csl: vancouver.csl | ||
vignette: > | ||
%\VignetteIndexEntry{Data sources} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
```{r, include = FALSE} | ||
knitr::opts_chunk$set( | ||
echo = FALSE, | ||
results = "asis", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This tells knitr to treat all the output as plain text rather than as code output text |
||
collapse = TRUE, | ||
comment = "#>" | ||
) | ||
library(tidyverse) | ||
library(osdc) | ||
``` | ||
|
||
This document describes the sources of data needed by the OSDC algorithm | ||
and gives a brief overview of each of these sources and how they might | ||
look like. The algorithm uses these Danish registers as input data | ||
sources: | ||
|
||
```{r, results='asis'} | ||
osdc:::registers_as_md_table("Danish registers used in the OSDC algorithm.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The triple-colons is because this is an internal function/object or what is going on here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. haah yes! Or use |
||
``` | ||
|
||
In a future revision, the algorithm can also use the Danish Medical | ||
Birth Register to extend the period of time of valid inclusions further | ||
back in time compared to what is possible using obstetric codes from the | ||
National Patient Register. | ||
|
||
## Expected data structure | ||
|
||
This section describes how the data sources are expected to look like | ||
when they are input into the OSDC algorithm. We try to mimic as much as | ||
possible how the raw data looks like within Denmark Statistics. So since | ||
registers are often stored on a per year basis, we don't expect a year | ||
variable in the data itself. If you've processed the data so that it has | ||
a year variable, you will likely need to do a split-apply-combine | ||
approach when using the osdc package. We internally convert all variable | ||
names to lower case, and so we present them here in lower case, but case | ||
may vary between data sources (and even between years in the same data | ||
source) in real data. | ||
|
||
A small note about the National Patient Register. It contains several | ||
tables and types of data. The algorithm uses only hospital diagnosis | ||
data that contained in four registers, which are a pair of two related | ||
registers used before (LPR2) and after (LPR3) 2019. So the LPR2 to LPR3 | ||
equivalents are `lpr_adm` to `kontakter` and `lpr_diag` to `diagnoser`. | ||
Most of the variables have equivalents as well, except that while | ||
`c_spec` is the LPR2 equivalent of `hovedspeciale_ans` in LPR3, the | ||
specialty values in `hovedspeciale_ans` are coded as literal specialty | ||
names and are different from the padded integer codes that `c_spec` | ||
contains. | ||
|
||
On Statistics Denmark, these tables are provided as a mix of separate | ||
files for each calendar year prior to 2019 (in LPR2 format) and a single | ||
file containing all the data from 2019 onward (LPR3 format). The two | ||
tables can be joined with either the `recnum` variable (LPR2 data) or | ||
the `dw_ek_kontakt` variable (LPR3 data). | ||
|
||
```{r} | ||
for (register in osdc:::get_register_abbrev()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While I almost always suggest not to use for loops, this is one of those cases that you need to, since these functions are used to create Markdown text, and it doesn't really work in a "functional" way. |
||
print(glue::glue("### {osdc:::register_as_md_header(register)}")) | ||
|
||
osdc:::variables_as_md_table( | ||
register, | ||
caption = glue::glue("Variables and their descriptions within the `{register}` register.") | ||
) |> | ||
print() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need to print because only the last thing in a for loop is output, but we want all these things to output. |
||
|
||
osdc:::register_data_as_md_table( | ||
register, | ||
caption = glue::glue("Simulated example of what the data looks like for the `{register}` register.") | ||
) |> | ||
print() | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever you use a function from a package that you set as "suggests" as a dependency, you need to include a check function to inform the user to install these packages if they are not installed. So that's what these do (when you use
use_package("packagename", "suggests")
, it tells you exactly what to do)