-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: add some unit tests for hba1c inclusion #101
Changes from all commits
a5473ed
0f821df
8bf8902
8f5e2c6
be73d0c
3d23094
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
name,logic | ||
hba1c,(analysiscode == 'NPU27300' AND value >= 48) OR (analysiscode == 'NPU03835' AND value >= 6.5) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,5 +8,6 @@ | |
|
||
library(testthat) | ||
library(osdc) | ||
library(dplyr) | ||
|
||
test_check("osdc") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
lab_forsker <- tibble::tribble( | ||
~patient_cpr, ~samplingdate, ~analysiscode, ~value, | ||
"498718589800", "20230101", "NPU27300", 49, | ||
"498718589801", "20230101", "NPU03835", 6.6, | ||
"498718589802", "20230101", "NPU03835", 6.3, | ||
"498718589803", "20230101", "NPU27300", 47, | ||
# Duplicate patient_cpr but with the old units. | ||
"498718589803", "20210101", "NPU27300", 49, | ||
"498718589803", "20220101", "NPU03835", 6.5, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens when one patient has both the old and new units that match? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The way it's been done so far, in cases of 03835 NPU-codes and 27300 NPU-codes within the same pnr and date, the 03835-row is filtered out/deduplicated. By the way, 03835-%-values are converted to 27300-mmol/mol-values using the following formula: or in code, e.g.: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is helpful. Just to confirm, if an old unit is measured on the same day as a new unit (purely hypothetical), and the old unit is above the threshold but the new unit isn't, is the new unit's threshold used or the old? And also, if someone had a measurement on the same day and one value was above the threshold and another was below it, as long as there is a true condition in the date, it is used? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your example is actually very real, since the test results would usually be reported back in both formats (unless the doctor actively unticks a box when filling out the form to request analyses). So you get two rows of data where everything is identical, except for the NPU-code and the value of the result - down to the exact time of the sample being drawn. This is done to appease clinicians who are more comfortable reading the old units. Since only one analysis is actually performed in the lab, the "old" format is simply generated in the lab IT system by mathematically converting from the "new" format, so both should be identical in terms of absolute values and relative to the diagnostic threshold. You can have multiple samples taken/recorded on the same day, and those can differ in terms of the result. Even if just one of them is positive, it counts as an inclusion event, yes. |
||
# Duplicate patient_cpr when old and new units are the same date. | ||
"498718589805", "20000101", "NPU03835", 6.5, | ||
"498718589805", "20000101", "NPU27300", 49, | ||
# Duplicate but with old below threshold and new above it. | ||
"498718589806", "20000101", "NPU03835", 6.3, | ||
"498718589806", "20000101", "NPU27300", 49, | ||
# Duplicate but with new below threshold and old above it. | ||
"498718589807", "20200101", "NPU03835", 6.6, | ||
"498718589807", "20200101", "NPU27300", 47, | ||
Comment on lines
+17
to
+18
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Aastedet So here, if there is a duplicate date, with new below but old above the threshold, should that be counted as a true condition? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that should be counted as a positive inclusion event. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Whenever you approve this PR, I'll merge it in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 😉 😉 @signekb 😄 😄 |
||
"498718589808", "20220101", "NPU00000", 100, | ||
"498718589809", "20220101", "NPU00000", 5 | ||
) | ||
|
||
expected <- tibble::tribble( | ||
~pnr, ~date, ~included_hba1c, | ||
"498718589800", "20230101", TRUE, | ||
"498718589801", "20230101", TRUE, | ||
"498718589803", "20210101", TRUE, | ||
"498718589803", "20220101", TRUE, | ||
"498718589805", "20000101", TRUE, | ||
"498718589806", "20000101", TRUE, | ||
"498718589807", "20200101", TRUE | ||
) | ||
|
||
test_that("dataset needs expected variables", { | ||
actual <- lab_forsker | ||
expect_error(include_hba1c(actual)) | ||
}) | ||
|
||
test_that("those with inclusion are kept", { | ||
actual <- include_hba1c(lab_forsker) | ||
expect_equal(actual, expected) | ||
}) | ||
|
||
test_that("casing of input variables doesn't matter", { | ||
actual <- lab_forsker |> | ||
rename_with(\(columns) toupper(columns)) |> | ||
include_hba1c() | ||
expect_equal(actual, expected) | ||
}) | ||
|
||
test_that("verification works for DuckDB Database", { | ||
actual <- arrow::to_duckdb(lab_forsker) |> | ||
include_hba1c() | ||
|
||
expect_equal(actual, expected) | ||
}) | ||
|
||
test_that("verification works for Arrow Tables (from Parquet)", { | ||
actual <- arrow::as_arrow_table(lab_forsker) |> | ||
include_hba1c() | ||
|
||
expect_equal(actual, expected) | ||
}) | ||
|
||
test_that("verification works for data.frame", { | ||
actual <- as.data.frame(lab_forsker) |> | ||
include_hba1c() | ||
|
||
expect_equal(actual, expected) | ||
}) | ||
|
||
test_that("verification works for data.table", { | ||
actual <- data.table::as.data.table(lab_forsker) |> | ||
include_hba1c() | ||
|
||
expect_equal(actual, expected) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how I was thinking the logic would be stored and written down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clever! For HbA1c we (currently) convert NPU03835 (%) values to NPU27300 (mmol/mol) values in the pipeline, so we could potentially simplify the logic to
hba1c,(analysiscode %in% c('NPU27300', 'NPU03835') AND value >= 48)
or similarThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be faster from a performance point of view to filter only and not do an additional calculation step before hand.