-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft for text description of the data #69
Merged
Merged
Changes from 3 commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
69276a0
First draft. Still missing the actual algorithm logic (among other th…
f9ccbb9
added section on output (to-do)
71d1e78
Added description of changes from original validation and potential f…
7d2dfd9
Update algorithm_logic.Rmd
Aastedet 89f78d5
Renamed document/file to indicate the focus on data.
c878066
Added c_spec variable to lpr_adm table. Corrected recnum scheme: the …
e58f1fe
Merge branch 'main' into general-logic-description
Aastedet 71eab8f
added note on c_spec values LPR2 vs LPR3
5a16c74
Merge branch 'general-logic-description' of https://github.com/steno-…
66a232e
docs: apply suggestions from review
lwjohnst86 f2f49d4
Merged origin/main into general-logic-description
lwjohnst86 45850be
chore: rename file to be a bit clearer
lwjohnst86 1bc9931
docs: use code to create the table listing the registers
lwjohnst86 53108a9
feat: helper functions to insert data into Markdown vignettes
lwjohnst86 cf30ea7
build: include dependencies from the helper functions
lwjohnst86 d069d3a
docs: updated vignette so variable and register data tables are creat…
lwjohnst86 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
--- | ||
title: "Description of algorithm contents & logic" | ||
output: rmarkdown::html_vignette | ||
bibliography: references.bib | ||
csl: vancouver.csl | ||
vignette: > | ||
%\VignetteIndexEntry{Design} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
```{r, include = FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE, | ||
comment = "#>" | ||
) | ||
library(dplyr) | ||
``` | ||
|
||
## Contents | ||
|
||
This document describes the data components involved in the algorithm. | ||
It also describes the implemented algorithm logic, changes compared to | ||
the originally validated algorithm, and road a map for potential changes | ||
in future revisions. Refer to the other vignettes for background | ||
information and a more general description of the algorithm. | ||
|
||
## Data components | ||
|
||
The algorithm uses five different types of data, contained in five | ||
register sources: | ||
|
||
1. Hospital diagnoses | ||
- The National Patient Register [Landspatientregisteret] | ||
2. Prescription drugs purchased | ||
- The Register of Pharmaceutical Sales | ||
[Lægemiddelstatistikregisteret] | ||
3. Hemoglobin-A1c tests | ||
- The Register of Laboratory Results for Research | ||
[Laboratoriedatabasens Forskertabel] | ||
4. Diabetes-specific podiatrist services | ||
- The National Health Insurance Service Register | ||
[Sygesikringsregisteret] | ||
5. Sex & date of birth | ||
- The Danish Civil Registration System [CPR-registeret] | ||
|
||
In a future revision, the algorithm can also utilise the Danish Medical | ||
Birth Register to extend the period of time of valid inclusions further | ||
back in time compared to what is possible using obstetric codes from the | ||
National Patient Register. | ||
|
||
## Pre-processing steps | ||
|
||
This section describes the necessary steps required to format raw data | ||
into a format that can be fed as input to the algorithm. The description | ||
assumes that raw data is stored/structured in the most common format for | ||
raw data provided on Statistics Denmark's servers (from our experience). | ||
|
||
Using the most common scenario when working with the above data on | ||
Statistics Denmark's servers, this paragraph lists the common register | ||
abbreviations/raw file names, their structure (year-on-year files vs. a | ||
large single file, plus changes/breaks over time), raw variable names | ||
and relevant values. Variable names are presented in lower case here, | ||
but case may vary between data sources (and even between years in the | ||
same data source) in real data. | ||
|
||
Depending on the contents and format of your specific raw data, you may | ||
need to adapt the pre-processing pipeline accordingly. | ||
|
||
## Structure of raw data | ||
|
||
### National Patient Register | ||
|
||
The National Patient Register contains several tables and types of data. | ||
The algorithm uses only hospital diagnosis data, which is contained in | ||
two tables: | ||
|
||
1. A table containing administrative information, e.g. personal ID, | ||
`pnr`/`cpr`, and the first date of the contact, | ||
`d_inddto`/`dato_start`. | ||
|
||
- Named `lpr_adm` in the LPR2-formatted data prior to 2019, and | ||
`kontakter` in contact-based LPR3-formatted data from 2019 | ||
onward. | ||
|
||
2. A table containing all information on diagnoses recorded at each | ||
contact, `c_diag`, and the type of diagnosis (e.g. primary or | ||
secondary to the contact), `c_diagtype`. | ||
|
||
- Named `lpr_diag` in the LPR2-formatted data prior to 2019, and | ||
`diagnoser` in contact-based LPR3-formatted data from 2019 | ||
onward. | ||
|
||
On Statistics Denmark, these tables are provided as a mix of separate | ||
files for each calendar year prior to 2019 (in LPR2 format) and a single | ||
file containing all the data from 2019 onward (LPR3 format). The two | ||
tables can be joined with either the `recnum` variable (LPR2 data) or | ||
the `dw_ek_kontakt` variable (LPR3 data). | ||
|
||
Examples of this data is shown below: | ||
|
||
| pnr | recnum | d_inddto | | ||
|-----|--------|------------| | ||
| 01 | 001 | 2003-01-31 | | ||
| 02 | 002 | 2003-02-01 | | ||
| 02 | 003 | 2003-02-01 | | ||
|
||
: Raw structure of lpr_adm: administrative data in the National Patient | ||
Register before 2019. Corresponding variable names 2019 onward: `pnr`= | ||
`cpr`, `recnum` = `dw_ek_kontakt`, `d_inddto` = `dato_start` | ||
|
||
| recnum | c_diag | c_diagtype | | ||
|--------|--------|------------| | ||
| 001 | DE101 | A | | ||
| 002 | DI21 | A | | ||
| 003 | DE115 | B | | ||
|
||
: Raw structure of lpr_diag: diagnosis data in the National Patient | ||
Register before 2019. Corresponding variable names 2019 onward: | ||
`recnum`= `dw_ek_kontakt`, `c_diag` = `diagnosekode`, `c_diagtype` = | ||
`diagnosetype` | ||
|
||
### Register of Pharmaceutical Sales | ||
|
||
To-do | ||
|
||
### National Health Insurance Service Register | ||
|
||
To-do | ||
|
||
Content: SSSY and SYSI (overlap in 2005) | ||
|
||
### Register of Laboratory Results for Research | ||
|
||
To-do | ||
|
||
### Civil Registration System | ||
|
||
To-do | ||
|
||
## Expected input | ||
|
||
This section describes the required structure of the data objects that | ||
can be used as input parameters to the OSDC algorithm (preferably | ||
presented as table examples, maybe based on the synthetic data objects) | ||
|
||
## Algorithm logic | ||
|
||
This section describes what operations are performed on the input data. | ||
|
||
## Expected output | ||
|
||
This section describes the output object. | ||
|
||
## Changes since original validation | ||
|
||
1. Purchases of semaglutid, dapagliflozin or empagliflozin are no | ||
longer used for inclusion events or classification of diabetes type | ||
(due to increasing use in treatment of non-diabetes). | ||
2. Diabetes type reclassification based on insulin purchases in the | ||
previous year is no longer used. | ||
|
||
## Roadmap for potential changes | ||
|
||
1. Add support for using medical birth register to define pregnancies | ||
to censor GDM. Allows censoring GLD purchases all the way back to | ||
1995 (rather than 1997 onward, as the obstetric codes are limited | ||
to), and extends the window of valid dates of diagnosis to 1996 | ||
onward. | ||
2. Simplify logic defining pregnancy index dates to remove dependency | ||
on maternal care visits (if performance in validation allows) | ||
3. Limit the scope of primary diagnoses used to evaluate majority of | ||
diabetes-specific diagnoses in type classification. | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add links to specific vignettes with relevant information here?