add `age_categorical` or allow `factor` class for `age` tag #148

avallecam · 2024-09-27T17:08:02Z

thinking in the downstream implications or opportunities for cleanepi::timespan(), I found that getting age as a categorical variable early in the pipeline could be used to get stratified estimates of CFR by age categories (example in last challenge about severity heterogeneity)

But when building an end-to-end pipeline from cleanepi to linelist to incidence2, I found that the tag type for age was a bottleneck. Even though I can allow for an extra tag to validate, those lines add an extra layer of complexity if I want to be direct during a training session.

Knowing that age as factor is currently useful in downstream analysis, should we add factor to the default tag type for age?

library(cleanepi)
library(linelist)
library(incidence2)
library(tidyverse)

dat <- readRDS(system.file("extdata", "test_df.RDS", package = "cleanepi"))

dat %>% 
  dplyr::as_tibble() %>% 
  # standardize column names and dates
  cleanepi::standardize_column_names() %>% 
  cleanepi::standardize_dates(
    target_columns = c("date_of_birth","date_first_pcr_positive_test")
  ) %>% 
  # calculate the age in 'years' and return the remainder in 'months'
  cleanepi::timespan(
    target_column = "date_of_birth",
    end_date = Sys.Date(),
    span_unit = "years",
    span_column_name = "age_in_years",
    span_remainder_unit = "months"
  ) %>% 
  # categorize the age numerical variable [add as a challenge hint]
  dplyr::mutate(
    age_category = base::cut(
      x = age_in_years,
      breaks = c(0,20,35,60,Inf), # replace with max value if known
      include.lowest = TRUE,
      right = FALSE
    )
    # age_category = Hmisc::cut2(x = age_in_years,cuts = c(20,35,60))
  ) %>% 
  # tag variables
  linelist::make_linelist(
    date_reporting = "date_first_pcr_positive_test",
    # age = "age_category", # does not pass validation, instead
    age = "age_in_years",
    occupation = "age_category", # incorrect alternative to make it valid
    # age_category = "age_category", # correct procedure
    # allow_extra = TRUE
  ) %>% 
  # validate linelist
  linelist::validate_linelist(
    # allow_extra = TRUE,
    # ref_types = tags_types(
    #   age_category = c("factor"),
    #   allow_extra = TRUE
    # )
  ) %>% 
  # get tags dataframe
  linelist::tags_df() %>%
  # aggregate and visualize
  incidence2::incidence(
    date_index = "date_reporting",
    groups = "occupation", # but expected: "age_category"
    interval = "month", 
    complete_dates = TRUE
  )
#> # incidence:  30 x 4
#> # count vars: date_reporting
#> # groups:     occupation
#>    date_index occupation count_variable count
#>    <yrmon>    <fct>      <chr>          <int>
#>  1 2020-Dec   [20,35)    date_reporting     0
#>  2 2020-Dec   [35,60)    date_reporting     1
#>  3 2020-Dec   [60,Inf]   date_reporting     0
#>  4 2021-Jan   [20,35)    date_reporting     0
#>  5 2021-Jan   [35,60)    date_reporting     0
#>  6 2021-Jan   [60,Inf]   date_reporting     1
#>  7 2021-Feb   [20,35)    date_reporting     2
#>  8 2021-Feb   [35,60)    date_reporting     1
#>  9 2021-Feb   [60,Inf]   date_reporting     2
#> 10 2021-Mar   [20,35)    date_reporting     1
#> # ℹ 20 more rows

packageVersion("linelist")
#> [1] '1.1.4'

^{Created on 2024-09-27 with reprex v2.1.0}

The text was updated successfully, but these errors were encountered:

avallecam changed the title ~~add age_categorical or allow factor class for age~~ add age_categorical or allow factor class for age tag Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `age_categorical` or allow `factor` class for `age` tag #148

add `age_categorical` or allow `factor` class for `age` tag #148

avallecam commented Sep 27, 2024 •

edited

Loading

add age_categorical or allow factor class for age tag #148

add age_categorical or allow factor class for age tag #148

Comments

avallecam commented Sep 27, 2024 • edited Loading

add `age_categorical` or allow `factor` class for `age` tag #148

add `age_categorical` or allow `factor` class for `age` tag #148

avallecam commented Sep 27, 2024 •

edited

Loading