Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: 📝 expand on inclusions and exclusions #133

Merged
merged 33 commits into from
Dec 19, 2024
Merged
Changes from 22 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
90c7492
Fleshed out and updated include_gld_purchases() flow documentation
Sep 18, 2024
0504532
Added description of podiatrist services function flow
Sep 18, 2024
7777f36
Reformated some GLD text, added HbA1c and started on pregnancy dates
Sep 18, 2024
6382624
Reworded include_hba1c section
Sep 19, 2024
4fd5903
Added lpr-joins, started on describing lpr processing
Sep 19, 2024
29fea86
Finished LPR/diagnosis part of function flow
Sep 19, 2024
f9d7661
fixed a new things to describe LPR3 processing
Sep 19, 2024
9a05d81
specified that only primary diagnoses go into type classification
Sep 19, 2024
f03a4da
Update vignettes/function-flow.Rmd
Aastedet Sep 19, 2024
bc889d4
switched the order of inclusion sections and mentioned that some of t…
Sep 19, 2024
8d60bd0
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
Sep 19, 2024
9a74ea0
Merge branch 'main' into update-function-flow
Aastedet Sep 20, 2024
7525b60
fixed spec to speciale variable name
Sep 20, 2024
25db86d
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
Sep 20, 2024
092824e
Removed "name" or "vnr" variables from GLD function flow.
Sep 20, 2024
20f5886
Updates join_lpr function description to filter to necessary diagnoses.
Sep 20, 2024
61b5d27
Removed section on weightloss drugs, since we're no longer including …
Sep 20, 2024
7b9738d
Update vignettes/function-flow.Rmd
Aastedet Sep 20, 2024
3a95d4f
Added description of exclude_potential_pcos()
Sep 20, 2024
f663844
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
Sep 20, 2024
fe257a6
Renamed some variables.
Sep 20, 2024
4bba18e
Added censoring/exclusion function description
Sep 20, 2024
7cca920
Added correct diagnoses to filter to in lpr_join() functions.
Sep 27, 2024
b40412c
changed specialty values to align with the PR with a refactored creat…
Sep 27, 2024
35118e8
Joining inclusions and definition. Looking to add type classification.
Dec 16, 2024
cbee21f
Removed helper function for dropping first event as it seemed a bit e…
Dec 17, 2024
3820459
Added function flow description of get_diabetes_type() and its helper…
Dec 17, 2024
d294071
docs: :pencil2: small edits from review
lwjohnst86 Dec 18, 2024
159ab16
Merge branch 'main' of https://github.com/steno-aarhus/osdc into upda…
lwjohnst86 Dec 18, 2024
b94e36d
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
lwjohnst86 Dec 18, 2024
da55507
Update vignettes/function-flow.Rmd
Aastedet Dec 19, 2024
8cc3cee
Merge branch 'main' of https://github.com/steno-aarhus/osdc into upda…
lwjohnst86 Dec 19, 2024
822da79
Merge branch 'update-function-flow' of https://github.com/steno-aarhu…
lwjohnst86 Dec 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
253 changes: 195 additions & 58 deletions vignettes/function-flow.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -97,98 +97,234 @@ library(dplyr)
library(osdc)
```

#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)
#### Hospital diagnoses

The function `include_hba1c()` uses `lab_forsker` as the input data to
extract all events of HbA1c tests above the diagnosis cut-off value.
**Joining LPR2 and LPR3 data**
lwjohnst86 marked this conversation as resolved.
Show resolved Hide resolved

Since the HbA1c diagnosis cut-off value depends on the kind of test that is
used, the inclusion event is defined as follows:
The helper functions `join_lpr2()` and `join_lpr3()` join records of
diagnoses to administrative information in LPR2-formatted and
LPR3-formatted data, respectively.

- For HbA1c IFCC (NPU03835), we include values \>= 6.5 %.
- For HbA1c DCCT (NPU27300), we include values \>= 48 mmol/mol.
`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, filters to the
lwjohnst86 marked this conversation as resolved.
Show resolved Hide resolved
necessary diagnoses (`c_diag` starting with "DO", "DZ3", "DE1[0-4]",
"249", or "250"), joins the required information by record number
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
(`recnum`), and outputs a `data.frame` with the following variables:

```{r, echo=FALSE}
algorithm |>
filter(name=="hba1c") |>
knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.")
```
- identifier variable (`pnr`)
- date (originally `d_inddto`, renamed to `date`)
- department specialty (`c_spec`)
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
- diagnosis code (`c_diag`)
- diagnosis type (`c_diagtype`)

#### Hospital diagnosis of diabetes

The function `include_diabetes_diagnoses()` uses the hospital contacts
from LPR2 and 3 to include all dates of diabetes diagnoses. Diabetes
diagnoses from both ICD 8 and ICD 10 are included.
`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to
lwjohnst86 marked this conversation as resolved.
Show resolved Hide resolved
the necessary diagnoses (`diagnosekode` starting with "DO", "DZ3", or
"DE1[0-4]"), joins the required information by record number
(`dw_ek_kontakt`), and outputs a `data.frame` with the following
variables:

This function contains two helper functions:
- identifier variable (originally `cpr`, renamed to `pnr`)
- date (originally `dato_start`, renamed to `date`)
- department specialty (`hovedspeciale_ans`)
- diagnosis code (`diagnosekode`)
- diagnosis type (`diagnosetype`)
- diagnosis retracted (`senere_afkraeftet`)

- `keep_diabetes_icd10()`
- `keep_diabetes_icd8()`
These outputs are passed to `include_diabetes_diagnoses()` (and to
`get_pregnancy_dates()`, see exclusion events) for further processing
below.

<!-- TODO: Add details on how this filtering should be done, e.g., diagnosis codes -->
**Processing of diabetes diagnoses**
lwjohnst86 marked this conversation as resolved.
Show resolved Hide resolved

<!-- TODO: Which specific ICD 8 and 10 codes are included? -->
The function `include_diabetes_diagnoses()` uses the hospital contacts
from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for
inclusion, as well as additional information needed to classify diabetes
type. Diabetes diagnoses from both ICD-8 and ICD-10 are included.

The function takes the outputs of `join_lpr2()` and `join_lpr3()` as
inputs and processes each input separately to generate the following
internal variables:

- LPR2-data:
- `pnr`: identifier variable
- `dates`: dates of all included diabetes diagnoses:
- registered as primary (A) or secondary (B) diagnoses, regardless
of type or department: - `c_diag` starts with "DE1[0-4]", "249",
or "250" and `c_diagtype` is either "A" or "B"
- `is_primary`: Define whether the diagnosis was a primary
diagnosis (`c_diagtype` == "A")
- `is_t1d`: Define whether the diagnosis was T1D-specific
(`c_diag` starts with "DE10" or "249")
- `is_t2d`: Define whether the diagnosis was T2D-specific
(`c_diag` starts with "DE11" or "250")
- `department`: Define whether the diagnosis was made made by an
endocrinological (`c_spec` == 8) or other medical department
(`c_spec` \< 8 or 9-30)
- LPR3:
- `pnr`: identifier variable
- `dates`: dates of all included diabetes diagnoses:
- Registered as primary (A) or secondary (B) diagnoses, regardless
of type or department, but exclude retracted diagnoses: -
`diagnosekode` starts with "DE1[0-4]", `diagnosetype` is either
"A" or "B" and `senere_afkraeftet` == "Nej")
- `is_primary`: Define whether the diagnosis was a primary
diagnosis (`diagnosetype` == "A")
- `is_t1d`: Define whether the diagnosis was T1D-specific
(`diagnosekode` starts with "DE10")
- `is_t2d`: Define whether the diagnosis was T2D-specific
(`diagnosekode` starts with "DE11")
- `department`: Define whether the diagnosis was made made by an
endocrinological (`hovedspeciale_ans` == "medicinsk
endokrinologi") or other medical department (`hovedspeciale_ans`
either "Blandet medicin og kirurgi", "Intern medicin",
"Geriatri", "Hepatologi", "Hæmatologi", "Infektionsmedicin",
"Kardiologi", "Medicinsk allergologi", "Medicinsk
gastroenterologi", "Medicinsk lungesygdomme", "Nefrologi",
"Reumatologi", "Palliativ medicin", "Akut medicin",
"Dermato-venerologi", "Neurologi", "Onkologi", "Fysiurgi", or
"Tropemedicin")

These intermediate results are combined for further processing, and
`include_diabetes_diagnoses()` outputs a single `data.frame` with the
following variables (up to two rows per individual):

- identifier variable (`pnr`)
- dates of the first and second hospital diabetes diagnosis
(`diagnosis_date`)
- number of type 1 diabetes-specific primary diagnosis codes from
endocrinological departments (`n_t1d_endo`)
- number of type 2 diabetes-specific primary diagnosis codes from
endocrinological departments (`n_t2d_endo`)
- number of type 1 diabetes-specific primary diagnosis codes from
medical departments (`n_t1d_medical`)
- number of type 2 diabetes-specific primary diagnosis codes from
medical departments (`n_t2d_medical`)
Aastedet marked this conversation as resolved.
Show resolved Hide resolved

The output is passed to the `get_diagnosis_date()` function for the
final step of the inclusion process and is subsequently used to classify
diabetes type.

#### Diabetes-specific podiatrist services

The function `include_podiatrist_services()` uses `sysi` or `sssy` as
input to extract the dates of all diabetes-specific podiatrist services.

<!-- TODO: Add details on how this filtering should be done -->
These dates are extracted by filtering values beginning with "54" in the
`speciale` variable of the `sssy` and `sysi` registers by default
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
(alternatively, the function can take the `spec2` variable as input
instead, if that is the data available to the user). In addition,
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
services provided to a child of the individual (`barnmak` != 0) are
excluded using the `barnmak` variable. An internal helper function
`get_unique_honuge_dates()` is applied to generate a proper date
variable based on the year-week (wwyy-formatted) variable (`honuge`)
found in the raw data, and de-duplicates multiple services registered on
the same date.

#### GLD purchases
`include_podiatrist_services()` outputs a 2-column data frame with up to
two rows for each individual, containing the following variables:

The function `include_gld_purchases()` uses `lmdb` to extract the dates
of all GLD purchases (from 1997 onwards).
- identifier variable (`pnr`)
- the dates of the first and second diabetes-specific podiatrist
record (`dates`)

<!-- TODO: Add details on how this filtering should be done -->
The output is passed to the `get_diagnosis_date()` function for the
Aastedet marked this conversation as resolved.
Show resolved Hide resolved
final step of the inclusion process.

<!-- TODO: Add this + link to resource "For details about this, see [link]." -->
#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)

### Exclusion events
The function `include_hba1c()` uses `lab_forsker` as the input data to
extract the dates of all elevated HbA1c test results, using the
appropriate cut-offs:

#### HbA1c tests and GLD purchases during pregnancy
- IFCC units: `analysiscode` NPU27300, any `value` $\geq$ 48 mmol/mol
- DCCT units: `analysiscode` NPU03835: any `value` $\geq$ 6.5% .

The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as
input and is used to exclude both HbA1c tests and GLD purchases during
pregnancy.
```{r, echo=FALSE}
algorithm |>
filter(name=="hba1c") |>
knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.")
```

Internally, this relies on the function `get_pregnancy_dates()` that
contains the following three helper functions:
Multiple elevated results on the same day within each individual are
deduplicated, to account for the same test result often being reported
twice (one for IFCC, one for DCCT units).

- `calculate_pregnancy_index_date_for_mc_visits_wo_end_date()` (this
might be removed with the inclusion of the birth register)
- `get_pregnancy_end_dates()`: Keep maternal care visits with an end
date and drop visits between 40 weeks before end date and 12 weeks
after end date.
- `get_maternal_care_visit_dates_without_end_date()`: Uses the output
from `get_pregnancy_end_dates()` which identifies maternal care
visits *with* end dates to derive maternal care visits *without* end
dates. below.
`include_hba1c()` outputs a 2-column data frame containing the following
variables:

<!-- TODO: What is done with the mc visits without end dates then? -->
- identifier variable (`pnr`)
- the dates of all elevated HbA1c test results (`dates`).

<!-- TODO: Add details on how this filtering should be done -->
The output is passed to the `exclude_pregnancy()` function for censoring
of elevated results due to potential gestational diabetes (see below).

#### Glucose-lowering brand drugs for weight loss
#### GLD purchases

The function `exclude_wld_purchases()` uses lmdb as input and excludes
the brand drugs Saxenda and Wegovy.
The function `include_gld_purchases()` uses `lmdb` to extract the dates
of all GLD purchases.

<!-- TODO: Add details on how this filtering should be done -->
These dates are extracted by including all values beginning with "A10"
in the `atc` variable of the `lmdb` register. Since the diagnosis code
data on pregnancies (see below) is insufficient to perform censoring
prior to 1997, `include_gld_purchases()` only extracts dates from 1997
onward by default (if Medical Birth Register data is available to use
for censoring, the extraction window can be extended).

#### Metformin purchases for women below age 40
This function outputs a `data.frame` with the following variables needed
later in the classification part of the function flow:

The function `exclude_potential_pcos()` as input to exclude all
purchases of metformin by women below age 40 (i.e., \<= 39 years old) at
the date of purchase. It relies on `bef` as input.
- identifier variable (`pnr`)
- date (originally `eksd`, renamed to `date`)
- type of drug (`atc`)
- amount purchased (`volume` and `number_of_packages` (originally
named `apk`))
- indication code (originally `indo`, renamed to `indication_code`)

This function contains two helper functions:
These events are then passed to a chain of exclusion functions:
`exclude_wld_purchases()`, `exclude_potential_pcos()`,
`exclude_pregnancy()` described in the sections below.

- `keep_women()`
- `drop_age_40_below()`
### Exclusion events

<!-- TODO: Add details on how this filtering should be done -->
#### Metformin purchases potentially for the treatment of polycystic ovary syndrome

The function `exclude_potential_pcos()` takes the output from
`include_gld_purchases()` and `bef` (information on sex and date of
birth) as inputs and censors (filters out) all purchases of metformin in
women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` =
"woman" & date at purchase (`date`-`date_of_birth`) \< 40 years) or an
indication code suggesting treatment of polycystic ovary syndrome (`atc`
= "A10BA02" & `sex` = "woman" & `indication_code` either "0000092",
"0000276", "0000781").

After these exclusions are made, the output is passed to
`exclude_pregnancy()` for further censoring, described below:

#### HbA1c tests and GLD purchases during pregnancy

The function `exclude_pregnancy()` takes the combined outputs from
`join_lpr2()`, `join_lpr3()`, `include_hba1c()`, and
`exclude_potential_pcos()` and uses diagnoses from LPR2 or LPR3 to
exclude both elevated HbA1c tests and GLD purchases during pregnancy, as
these may be due to gestational diabetes, rather than type 1 or type 2
diabetes.

Internally, this relies on the function `get_pregnancy_dates()` that
uses diagnoses registered in the National Patient Register to extract
lwjohnst86 marked this conversation as resolved.
Show resolved Hide resolved
the dates of all pregnancy ending (live births or miscarriages). These
are identified by filtering
`values beginning with "DO0[0-6]", "DO8[0-4]" or "DZ3[37]" in the`c_diag`variable in the LPR2 data (`diagnosekode`in LPR3 data). The dates output by`get_pregnancy_dates()\`
are used to exclude all inclusion events registered between 40 weeks
before and 12 weeks after a pregnancy ending.

After these exclusion functions have been applied, the output serves as
inputs to two sets of functions:

1. the `get_diagnosis_date()` function for the final step of the
inclusion process.
2. the `get_only_insulin_purchases()`,
`get_insulin_purchases_within_180_days()`, and
`get_insulin_is_two_thirds_of_gld_doses()` helper functions for the
classification of diabetes type.

### Get diagnosis date

Expand Down Expand Up @@ -299,3 +435,4 @@ is within a time-period of insufficient data coverage,
contains the inclusion date of this individual.

<!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->