Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add some unit tests for hba1c inclusion #101

Merged
merged 6 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions data-raw/algorithm.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
name,logic
hba1c,(analysiscode == 'NPU27300' AND value >= 48) OR (analysiscode == 'NPU03835' AND value >= 6.5)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I was thinking the logic would be stored and written down.

Copy link
Collaborator

@Aastedet Aastedet May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clever! For HbA1c we (currently) convert NPU03835 (%) values to NPU27300 (mmol/mol) values in the pipeline, so we could potentially simplify the logic to hba1c,(analysiscode %in% c('NPU27300', 'NPU03835') AND value >= 48) or similar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be faster from a performance point of view to filter only and not do an additional calculation step before hand.


1 change: 1 addition & 0 deletions tests/testthat.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@

library(testthat)
library(osdc)
library(dplyr)

test_check("osdc")
77 changes: 77 additions & 0 deletions tests/testthat/test-include-hba1c.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
lab_forsker <- tibble::tribble(
~patient_cpr, ~samplingdate, ~analysiscode, ~value,
"498718589800", "20230101", "NPU27300", 49,
"498718589801", "20230101", "NPU03835", 6.6,
"498718589802", "20230101", "NPU03835", 6.3,
"498718589803", "20230101", "NPU27300", 47,
# Duplicate patient_cpr but with the old units.
"498718589803", "20210101", "NPU27300", 49,
"498718589803", "20220101", "NPU03835", 6.5,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when one patient has both the old and new units that match?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it's been done so far, in cases of 03835 NPU-codes and 27300 NPU-codes within the same pnr and date, the 03835-row is filtered out/deduplicated.
I think that might be a bit over-complicated- I could look into if simply deduplicating all rows with HbA1c-related NPU-codes by pnr and date works out the same. Then we'll only have one HbA1c measurement/inclusion event per day, so some inclusions might get delayed, but if you follow diagnostic guidelines to the letter you have to have two elevated HbA1c samples taken on different dates for them to be diagnostic of diabetes anyway, so it'd align more with that.

By the way, 03835-%-values are converted to 27300-mmol/mol-values using the following formula:
value_27300 = (value_03835 * 100 * 10.93) - 23.5

or in code, e.g.:
value = ifelse(
analysiscode == "NPU03835",
(value * 100 * 10.93) - 23.5,
value
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful. Just to confirm, if an old unit is measured on the same day as a new unit (purely hypothetical), and the old unit is above the threshold but the new unit isn't, is the new unit's threshold used or the old?

And also, if someone had a measurement on the same day and one value was above the threshold and another was below it, as long as there is a true condition in the date, it is used?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your example is actually very real, since the test results would usually be reported back in both formats (unless the doctor actively unticks a box when filling out the form to request analyses). So you get two rows of data where everything is identical, except for the NPU-code and the value of the result - down to the exact time of the sample being drawn. This is done to appease clinicians who are more comfortable reading the old units. Since only one analysis is actually performed in the lab, the "old" format is simply generated in the lab IT system by mathematically converting from the "new" format, so both should be identical in terms of absolute values and relative to the diagnostic threshold.

You can have multiple samples taken/recorded on the same day, and those can differ in terms of the result. Even if just one of them is positive, it counts as an inclusion event, yes.

# Duplicate patient_cpr when old and new units are the same date.
"498718589805", "20000101", "NPU03835", 6.5,
"498718589805", "20000101", "NPU27300", 49,
# Duplicate but with old below threshold and new above it.
"498718589806", "20000101", "NPU03835", 6.3,
"498718589806", "20000101", "NPU27300", 49,
# Duplicate but with new below threshold and old above it.
"498718589807", "20200101", "NPU03835", 6.6,
"498718589807", "20200101", "NPU27300", 47,
Comment on lines +17 to +18
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aastedet So here, if there is a duplicate date, with new below but old above the threshold, should that be counted as a true condition?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be counted as a positive inclusion event.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you approve this PR, I'll merge it in ☺️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😉 😉 @signekb 😄 😄

"498718589808", "20220101", "NPU00000", 100,
"498718589809", "20220101", "NPU00000", 5
)

expected <- tibble::tribble(
~pnr, ~date, ~included_hba1c,
"498718589800", "20230101", TRUE,
"498718589801", "20230101", TRUE,
"498718589803", "20210101", TRUE,
"498718589803", "20220101", TRUE,
"498718589805", "20000101", TRUE,
"498718589806", "20000101", TRUE,
"498718589807", "20200101", TRUE
)

test_that("dataset needs expected variables", {
actual <- lab_forsker
expect_error(include_hba1c(actual))
})

test_that("those with inclusion are kept", {
actual <- include_hba1c(lab_forsker)
expect_equal(actual, expected)
})

test_that("casing of input variables doesn't matter", {
actual <- lab_forsker |>
rename_with(\(columns) toupper(columns)) |>
include_hba1c()
expect_equal(actual, expected)
})

test_that("verification works for DuckDB Database", {
actual <- arrow::to_duckdb(lab_forsker) |>
include_hba1c()

expect_equal(actual, expected)
})

test_that("verification works for Arrow Tables (from Parquet)", {
actual <- arrow::as_arrow_table(lab_forsker) |>
include_hba1c()

expect_equal(actual, expected)
})

test_that("verification works for data.frame", {
actual <- as.data.frame(lab_forsker) |>
include_hba1c()

expect_equal(actual, expected)
})

test_that("verification works for data.table", {
actual <- data.table::as.data.table(lab_forsker) |>
include_hba1c()

expect_equal(actual, expected)
})
Loading