Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add age_categorical or allow factor class for age tag #148

Open
avallecam opened this issue Sep 27, 2024 · 0 comments
Open

add age_categorical or allow factor class for age tag #148

avallecam opened this issue Sep 27, 2024 · 0 comments

Comments

@avallecam
Copy link
Member

avallecam commented Sep 27, 2024

thinking in the downstream implications or opportunities for cleanepi::timespan(), I found that getting age as a categorical variable early in the pipeline could be used to get stratified estimates of CFR by age categories (example in last challenge about severity heterogeneity)

But when building an end-to-end pipeline from cleanepi to linelist to incidence2, I found that the tag type for age was a bottleneck. Even though I can allow for an extra tag to validate, those lines add an extra layer of complexity if I want to be direct during a training session.

Knowing that age as factor is currently useful in downstream analysis, should we add factor to the default tag type for age?

library(cleanepi)
library(linelist)
library(incidence2)
library(tidyverse)

dat <- readRDS(system.file("extdata", "test_df.RDS", package = "cleanepi"))

dat %>% 
  dplyr::as_tibble() %>% 
  # standardize column names and dates
  cleanepi::standardize_column_names() %>% 
  cleanepi::standardize_dates(
    target_columns = c("date_of_birth","date_first_pcr_positive_test")
  ) %>% 
  # calculate the age in 'years' and return the remainder in 'months'
  cleanepi::timespan(
    target_column = "date_of_birth",
    end_date = Sys.Date(),
    span_unit = "years",
    span_column_name = "age_in_years",
    span_remainder_unit = "months"
  ) %>% 
  # categorize the age numerical variable [add as a challenge hint]
  dplyr::mutate(
    age_category = base::cut(
      x = age_in_years,
      breaks = c(0,20,35,60,Inf), # replace with max value if known
      include.lowest = TRUE,
      right = FALSE
    )
    # age_category = Hmisc::cut2(x = age_in_years,cuts = c(20,35,60))
  ) %>% 
  # tag variables
  linelist::make_linelist(
    date_reporting = "date_first_pcr_positive_test",
    # age = "age_category", # does not pass validation, instead
    age = "age_in_years",
    occupation = "age_category", # incorrect alternative to make it valid
    # age_category = "age_category", # correct procedure
    # allow_extra = TRUE
  ) %>% 
  # validate linelist
  linelist::validate_linelist(
    # allow_extra = TRUE,
    # ref_types = tags_types(
    #   age_category = c("factor"),
    #   allow_extra = TRUE
    # )
  ) %>% 
  # get tags dataframe
  linelist::tags_df() %>%
  # aggregate and visualize
  incidence2::incidence(
    date_index = "date_reporting",
    groups = "occupation", # but expected: "age_category"
    interval = "month", 
    complete_dates = TRUE
  )
#> # incidence:  30 x 4
#> # count vars: date_reporting
#> # groups:     occupation
#>    date_index occupation count_variable count
#>    <yrmon>    <fct>      <chr>          <int>
#>  1 2020-Dec   [20,35)    date_reporting     0
#>  2 2020-Dec   [35,60)    date_reporting     1
#>  3 2020-Dec   [60,Inf]   date_reporting     0
#>  4 2021-Jan   [20,35)    date_reporting     0
#>  5 2021-Jan   [35,60)    date_reporting     0
#>  6 2021-Jan   [60,Inf]   date_reporting     1
#>  7 2021-Feb   [20,35)    date_reporting     2
#>  8 2021-Feb   [35,60)    date_reporting     1
#>  9 2021-Feb   [60,Inf]   date_reporting     2
#> 10 2021-Mar   [20,35)    date_reporting     1
#> # ℹ 20 more rows

packageVersion("linelist")
#> [1] '1.1.4'

Created on 2024-09-27 with reprex v2.1.0

@avallecam avallecam changed the title add age_categorical or allow factor class for age add age_categorical or allow factor class for age tag Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant