Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: 📝 expand on inclusions and exclusions #133

Merged
merged 33 commits into from
Dec 19, 2024
Merged
Changes from 15 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
90c7492
Fleshed out and updated include_gld_purchases() flow documentation
Sep 18, 2024
0504532
Added description of podiatrist services function flow
Sep 18, 2024
7777f36
Reformated some GLD text, added HbA1c and started on pregnancy dates
Sep 18, 2024
6382624
Reworded include_hba1c section
Sep 19, 2024
4fd5903
Added lpr-joins, started on describing lpr processing
Sep 19, 2024
29fea86
Finished LPR/diagnosis part of function flow
Sep 19, 2024
f9d7661
fixed a new things to describe LPR3 processing
Sep 19, 2024
9a05d81
specified that only primary diagnoses go into type classification
Sep 19, 2024
f03a4da
Update vignettes/function-flow.Rmd
Aastedet Sep 19, 2024
bc889d4
switched the order of inclusion sections and mentioned that some of t…
Sep 19, 2024
8d60bd0
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
Sep 19, 2024
9a74ea0
Merge branch 'main' into update-function-flow
Aastedet Sep 20, 2024
7525b60
fixed spec to speciale variable name
Sep 20, 2024
25db86d
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
Sep 20, 2024
092824e
Removed "name" or "vnr" variables from GLD function flow.
Sep 20, 2024
20f5886
Updates join_lpr function description to filter to necessary diagnoses.
Sep 20, 2024
61b5d27
Removed section on weightloss drugs, since we're no longer including …
Sep 20, 2024
7b9738d
Update vignettes/function-flow.Rmd
Aastedet Sep 20, 2024
3a95d4f
Added description of exclude_potential_pcos()
Sep 20, 2024
f663844
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
Sep 20, 2024
fe257a6
Renamed some variables.
Sep 20, 2024
4bba18e
Added censoring/exclusion function description
Sep 20, 2024
7cca920
Added correct diagnoses to filter to in lpr_join() functions.
Sep 27, 2024
b40412c
changed specialty values to align with the PR with a refactored creat…
Sep 27, 2024
35118e8
Joining inclusions and definition. Looking to add type classification.
Dec 16, 2024
cbee21f
Removed helper function for dropping first event as it seemed a bit e…
Dec 17, 2024
3820459
Added function flow description of get_diabetes_type() and its helper…
Dec 17, 2024
d294071
docs: :pencil2: small edits from review
lwjohnst86 Dec 18, 2024
159ab16
Merge branch 'main' of https://github.com/steno-aarhus/osdc into upda…
lwjohnst86 Dec 18, 2024
b94e36d
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
lwjohnst86 Dec 18, 2024
da55507
Update vignettes/function-flow.Rmd
Aastedet Dec 19, 2024
8cc3cee
Merge branch 'main' of https://github.com/steno-aarhus/osdc into upda…
lwjohnst86 Dec 19, 2024
822da79
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
lwjohnst86 Dec 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 182 additions & 42 deletions vignettes/function-flow.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -97,76 +97,215 @@ library(dplyr)
library(osdc)
```

#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)
#### Hospital diagnoses

The function `include_hba1c()` uses `lab_forsker` as the input data to
extract all events of HbA1c tests above the diagnosis cut-off value.
**Joining LPR2 and LPR3 data**
lwjohnst86 marked this conversation as resolved.
Show resolved Hide resolved

Since the HbA1c diagnosis cut-off value depends on the kind of test that is
used, the inclusion event is defined as follows:
The helper functions `join_lpr2()` and `join_lpr3()` join records of
diagnoses to administrative information in LPR2-formatted and
LPR3-formatted data, respectively.

- For HbA1c IFCC (NPU03835), we include values \>= 6.5 %.
- For HbA1c DCCT (NPU27300), we include values \>= 48 mmol/mol.
`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, joins the
required information by record number (`recnum`), and outputs a
`data.frame` with the following variables:

```{r, echo=FALSE}
algorithm |>
filter(name=="hba1c") |>
knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.")
```
- identifier variable (`pnr`)
- date (`d_inddto`)
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
- department specialty (`c_spec`)
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
- diagnosis code (`c_diag`)
- diagnosis type (`c_diagtype`)

#### Hospital diagnosis of diabetes
`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, joins the
required information by record number (`dw_ek_kontakt`), and outputs a
`data.frame` with the following variables:

The function `include_diabetes_diagnoses()` uses the hospital contacts
from LPR2 and 3 to include all dates of diabetes diagnoses. Diabetes
diagnoses from both ICD 8 and ICD 10 are included.
- identifier variable (`cpr`)
- date (`dato_start`)
- department specialty (`hovedspeciale_ans`)
- diagnosis code (`diagnosekode`)
- diagnosis type (`diagnosetype`)
- diagnosis retracted (`senere_afkraeftet`)

This function contains two helper functions:
These outputs are passed to `include_diabetes_diagnoses()` (and to
`get_pregnancy_dates()`, see exclusion events) for further processing
below.

- `keep_diabetes_icd10()`
- `keep_diabetes_icd8()`
**Processing of diabetes diagnoses**
lwjohnst86 marked this conversation as resolved.
Show resolved Hide resolved

<!-- TODO: Add details on how this filtering should be done, e.g., diagnosis codes -->

<!-- TODO: Which specific ICD 8 and 10 codes are included? -->
The function `include_diabetes_diagnoses()` uses the hospital contacts
from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for
inclusion, as well as additional information needed to classify diabetes
type. Diabetes diagnoses from both ICD-8 and ICD-10 are included.

The function takes the outputs of `join_lpr2()` and `join_lpr3()` as
inputs and processes each input separately to generate the following
internal variables:

- LPR2-data:
- `pnr`: identifier variable
- `do_diagnosis`: include all diabetes diagnoses, registered as
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
primary (A) or secondary (B) diagnoses, regardless of type or
department: `c_diag` starts with "DE1[0-4]", "249", or "250" and
`c_diagtype` is either "A" or "B"
- `is_primary`: Define whether the diagnosis was a primary
diagnosis (`c_diagtype` == "A")
- `is_t1d`: Define whether the diagnosis was T1D-specific
(`c_diag` starts with "DE10" or "249")
- `is_t2d`: Define whether the diagnosis was T2D-specific
(`c_diag` starts with "DE11" or "250")
- `department`: Define whether the diagnosis was made made by an
endocrinological (`c_spec` == 8) or other medical department
(`c_spec` \< 8 or 9-30)
- LPR3:
- `pnr`: identifier variable
- `do_diagnosis`: include all diabetes diagnoses, registered as
primary (A) or secondary (B) diagnoses, regardless of type or
department: `diagnosekode` starts with "DE1[0-4]" and
`diagnosetype` is either "A" or "B", but exclude retracted
diagnoses (`senere_afkraeftet` == "Ja")
- `is_primary`: Define whether the diagnosis was a primary
diagnosis (`diagnosetype` == "A")
- `is_t1d`: Define whether the diagnosis was T1D-specific
(`diagnosekode` starts with "DE10")
- `is_t2d`: Define whether the diagnosis was T2D-specific
(`diagnosekode` starts with "DE11")
- `department`: Define whether the diagnosis was made made by an
endocrinological (`hovedspeciale_ans` == "medicinsk
endokrinologi") or other medical department (`hovedspeciale_ans`
either "Blandet medicin og kirurgi", "Intern medicin",
"Geriatri", "Hepatologi", "Hæmatologi", "Infektionsmedicin",
"Kardiologi", "Medicinsk allergologi", "Medicinsk
gastroenterologi", "Medicinsk lungesygdomme", "Nefrologi",
"Reumatologi", "Palliativ medicin", "Akut medicin",
"Dermato-venerologi", "Neurologi", "Onkologi", "Fysiurgi", or
"Tropemedicin")

These intermediate results are combined for further processing, and
`include_diabetes_diagnoses()` outputs a single `data.frame` with the
following variables (up to two rows per individual):

- identifier variable (`pnr`)
- dates of the first and second hospital diabetes diagnosis
(`diagnosis_dates`)
- number of type 1 diabetes-specific primary diagnosis codes from
endocrinological departments (`n_t1d_endo`)
- number of type 2 diabetes-specific primary diagnosis codes from
endocrinological departments (`n_t2d_endo`)
- number of type 1 diabetes-specific primary diagnosis codes from
medical departments (`n_t1d_medical`)
- number of type 2 diabetes-specific primary diagnosis codes from
medical departments (`n_t2d_medical`)
Aastedet marked this conversation as resolved.
Show resolved Hide resolved

The output is passed to the `get_diagnosis_date()` function for the
final step of the inclusion process and is subsequently used to classify
diabetes type.

#### Diabetes-specific podiatrist services

The function `include_podiatrist_services()` uses `sysi` or `sssy` as
input to extract the dates of all diabetes-specific podiatrist services.

<!-- TODO: Add details on how this filtering should be done -->
These dates are extracted by filtering values beginning with "54" in the
`speciale` variable of the `sssy` and `sysi` registers by default
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
(alternatively, the function can take the `spec2` variable as input
instead, if that is the data available to the user). In addition,
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
services provided to a child of the individual (`barnmak` != 0) are
excluded using the `barnmak` variable. An internal helper function
`get_unique_honuge_dates()` is applied to generate a proper date
variable based on the year-week (wwyy-formatted) variable (`honuge`)
found in the raw data, and de-duplicates multiple services registered on
the same date.

`include_podiatrist_services()` outputs a 3-column data frame with one
row for each individual, containing the following variables:

- identifier variable (`pnr`)
- the date of the first diabetes-specific podiatrist record
(`do_podiatrist_1`)
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
- the date of the second diabetes-specific podiatrist record
(`do_podiatrist_2`)

The output is passed to the `get_diagnosis_date()` function for the
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
final step of the inclusion process.

#### GLD purchases
#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)

The function `include_gld_purchases()` uses `lmdb` to extract the dates
of all GLD purchases (from 1997 onwards).
The function `include_hba1c()` uses `lab_forsker` as the input data to
extract the dates of all elevated HbA1c test results, using the
appropriate cut-offs:

<!-- TODO: Add details on how this filtering should be done -->
- IFCC units: `analysiscode` NPU27300, any `value` $\geq$ 48 mmol/mol
- DCCT units: `analysiscode` NPU03835: any `value` $\geq$ 6.5% .

```{r, echo=FALSE}
algorithm |>
filter(name=="hba1c") |>
knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.")
```

Multiple elevated results on the same day within each individual are
deduplicated, to account for the same test result often being reported
twice (one for IFCC, one for DCCT units).

`include_hba1c()` outputs a 2-column data frame containing the following
variables:

<!-- TODO: Add this + link to resource "For details about this, see [link]." -->
- identifier variable (`pnr`)
- the dates of all elevated HbA1c test results (`dates`).

The output is passed to the `exclude_pregnancy()` function for censoring
of elevated results due to potential gestational diabetes (see below).

#### GLD purchases

The function `include_gld_purchases()` uses `lmdb` to extract the dates
of all GLD purchases.

These dates are extracted by including all values beginning with "A10"
in the `atc` variable of the `lmdb` register. Since the diagnosis code
data on pregnancies (see below) is insufficient to perform censoring
prior to 1997, `include_gld_purchases()` only extracts dates from 1997
onward by default (if Medical Birth Register data is available to use
for censoring, the extraction window can be extended).

This function outputs a `data.frame` with the following variables needed
later in the classification part of the function flow:

- identifier variable (`pnr`)
- date (`eksd`)
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
- type of drug (`atc`)
- amount purchased (`volume` and `apk`)
- indication code (`indo`)

These events are then passed to a chain of exclusion functions:
`exclude_wld_purchases()`, `exclude_potential_pcos()`,
`exclude_pregnancy()` described in the sections below.

After these exclusion functions have been applied, the output serves as
inputs to two sets of functions:

1. the `get_diagnosis_date()` function for the final step of the
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
inclusion process.
2. the `get_only_insulin_purchases()`,
`get_insulin_purchases_within_180_days()`, and
`get_insulin_is_two_thirds_of_gld_doses()` helper functions for the
classification of diabetes type.

### Exclusion events

#### HbA1c tests and GLD purchases during pregnancy

The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
input and is used to exclude both HbA1c tests and GLD purchases during
pregnancy.
pregnancy, as these may be due to gestational diabetes, rather than type
1 or type 2 diabetes.

Internally, this relies on the function `get_pregnancy_dates()` that
contains the following three helper functions:

- `calculate_pregnancy_index_date_for_mc_visits_wo_end_date()` (this
might be removed with the inclusion of the birth register)
- `get_pregnancy_end_dates()`: Keep maternal care visits with an end
date and drop visits between 40 weeks before end date and 12 weeks
after end date.
- `get_maternal_care_visit_dates_without_end_date()`: Uses the output
from `get_pregnancy_end_dates()` which identifies maternal care
visits *with* end dates to derive maternal care visits *without* end
dates. below.

<!-- TODO: What is done with the mc visits without end dates then? -->
uses diagnoses registered in the National Patient Register to extract
lwjohnst86 marked this conversation as resolved.
Show resolved Hide resolved
the dates of all pregnancy ending (live births or miscarriages). These
are identified by filtering values beginning with "DO0[0-6]", "DO8[0-4]"
or "DZ3[37]" in the `c_diag` variable in the LPR2 data (`diagnosekode`
in LPR3 data).

<!-- TODO: Add details on how this filtering should be done -->

Expand Down Expand Up @@ -299,3 +438,4 @@ is within a time-period of insufficient data coverage,
contains the inclusion date of this individual.

<!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->