You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thinking in the downstream implications or opportunities for cleanepi::timespan(), I found that getting age as a categorical variable early in the pipeline could be used to get stratified estimates of CFR by age categories (example in last challenge about severity heterogeneity)
But when building an end-to-end pipeline from cleanepi to linelist to incidence2, I found that the tag type for age was a bottleneck. Even though I can allow for an extra tag to validate, those lines add an extra layer of complexity if I want to be direct during a training session.
Knowing that age as factor is currently useful in downstream analysis, should we add factor to the default tag type for age?
library(cleanepi)
library(linelist)
library(incidence2)
library(tidyverse)
dat<- readRDS(system.file("extdata", "test_df.RDS", package="cleanepi"))
dat %>%
dplyr::as_tibble() %>%
# standardize column names and datescleanepi::standardize_column_names() %>%
cleanepi::standardize_dates(
target_columns= c("date_of_birth","date_first_pcr_positive_test")
) %>%
# calculate the age in 'years' and return the remainder in 'months'cleanepi::timespan(
target_column="date_of_birth",
end_date= Sys.Date(),
span_unit="years",
span_column_name="age_in_years",
span_remainder_unit="months"
) %>%
# categorize the age numerical variable [add as a challenge hint]dplyr::mutate(
age_category=base::cut(
x=age_in_years,
breaks= c(0,20,35,60,Inf), # replace with max value if knowninclude.lowest=TRUE,
right=FALSE
)
# age_category = Hmisc::cut2(x = age_in_years,cuts = c(20,35,60))
) %>%
# tag variableslinelist::make_linelist(
date_reporting="date_first_pcr_positive_test",
# age = "age_category", # does not pass validation, insteadage="age_in_years",
occupation="age_category", # incorrect alternative to make it valid# age_category = "age_category", # correct procedure# allow_extra = TRUE
) %>%
# validate linelistlinelist::validate_linelist(
# allow_extra = TRUE,# ref_types = tags_types(# age_category = c("factor"),# allow_extra = TRUE# )
) %>%
# get tags dataframelinelist::tags_df() %>%
# aggregate and visualizeincidence2::incidence(
date_index="date_reporting",
groups="occupation", # but expected: "age_category"interval="month",
complete_dates=TRUE
)
#> # incidence: 30 x 4#> # count vars: date_reporting#> # groups: occupation#> date_index occupation count_variable count#> <yrmon> <fct> <chr> <int>#> 1 2020-Dec [20,35) date_reporting 0#> 2 2020-Dec [35,60) date_reporting 1#> 3 2020-Dec [60,Inf] date_reporting 0#> 4 2021-Jan [20,35) date_reporting 0#> 5 2021-Jan [35,60) date_reporting 0#> 6 2021-Jan [60,Inf] date_reporting 1#> 7 2021-Feb [20,35) date_reporting 2#> 8 2021-Feb [35,60) date_reporting 1#> 9 2021-Feb [60,Inf] date_reporting 2#> 10 2021-Mar [20,35) date_reporting 1#> # ℹ 20 more rows
packageVersion("linelist")
#> [1] '1.1.4'
thinking in the downstream implications or opportunities for
cleanepi::timespan()
, I found that gettingage
as a categorical variable early in the pipeline could be used to get stratified estimates of CFR by age categories (example in last challenge about severity heterogeneity)But when building an end-to-end pipeline from cleanepi to linelist to incidence2, I found that the tag type for age was a bottleneck. Even though I can allow for an extra tag to validate, those lines add an extra layer of complexity if I want to be direct during a training session.
Knowing that
age
asfactor
is currently useful in downstream analysis, should we addfactor
to the default tag type forage
?Created on 2024-09-27 with reprex v2.1.0
The text was updated successfully, but these errors were encountered: