Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicting from cph model with categorical variables interacting with strat term #104

Open
colinorourke opened this issue Jul 22, 2021 · 3 comments

Comments

@colinorourke
Copy link

colinorourke commented Jul 22, 2021

There seems to be an issue with making predictions from a model that contains interactions between a strat term and a categorical factor. To demonstrate, I'm going to use the example from the cph help file.

library(rms) 

#example from cph help page
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
#> Error in label(age) <- "Age": could not find function "label<-"
sex <- factor(sample(c('Male','Female'), n, 
                     rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt <- -log(runif(n))/h
label(dt) <- 'Follow-up Time'
#> Error in label(dt) <- "Follow-up Time": could not find function "label<-"
e <- ifelse(dt <= cens,1,0)
dt <- pmin(dt, cens)
units(dt) <- "Year"
#> Error in UseMethod("units<-"): no applicable method for 'units<-' applied to an object of class "c('double', 'numeric')"
dd <- datadist(age, sex)
#> Error in datadist(age, sex): could not find function "datadist"
options(datadist='dd')
S <- Surv(dt,e)
#> Error in Surv(dt, e): could not find function "Surv"

# categorize age to demonstrate issue
age2 = cut2(age, g = 3)
#> Error in cut2(age, g = 3): could not find function "cut2"

# fit model using age categories
f = cph(S ~ age2*strat(sex), x = TRUE)
#> Error in cph(S ~ age2 * strat(sex), x = TRUE): could not find function "cph"

Created on 2021-07-22 by the reprex package (v1.0.0)

Just to demonstrate the issue I'm having, I've created quantile-based age categories instead of using continuous age, then fit a model that has age category as a predictor which is interacted with sex as a stratification factor.

Now, when I try to make a prediction from this model I get an error.

#try to make prediction from model object f
predict(f, newdata = data.frame(age2 = "[43.6,54.6)", sex="Male"))
#> Error in matxv(X, coeff, kint = kint): columns in a (5) must be <= length of b (4)

Created on 2021-07-22 by the reprex package (v1.0.0)

This seems odds, but asking instead for the X matrix for those predictor values I think we see why the issue arises.

#try to get X matrix for object f
predict(f, newdata = data.frame(age2 = "[43.6,54.6)", sex="Male"), type = "x")
#>   age2[43.6,54.6) age2[54.6,98.9] age2[11.3,43.6):strat(sex)Male
#> 1               1               0                              0
#>   age2[43.6,54.6):strat(sex)Male age2[54.6,98.9]:strat(sex)Male
#> 1                              1                              0
#> attr(,"strata")
#> [1] sex=Male
#> Levels: sex=Female sex=Male

Created on 2021-07-22 by the reprex package (v1.0.0)

It looks like what's happening is that an extra column is sneaking into X called age2[11.3,43.6):strat(sex)Male which wouldn't be represented by a beta coeficient in the Cox PH model.

@harrelfe
Copy link
Owner

harrelfe commented Sep 5, 2021

Please provide a reproducible example that does not throw any errors before predict(), i.e., where you properly require(rms) or library(rms) at the beginning of the code R can find label<- and units<-.

@colinorourke
Copy link
Author

Was so excited to see reprex working I missed the errors. Sorry! Updated now with fully working code.

library(rms) 
#> Loading required package: Hmisc
#> Loading required package: lattice
#> Loading required package: survival
#> Loading required package: Formula
#> Loading required package: ggplot2
#> 
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#> 
#>     format.pval, units
#> Loading required package: SparseM
#> 
#> Attaching package: 'SparseM'
#> The following object is masked from 'package:base':
#> 
#>     backsolve

#example from cph help page
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n, 
                     rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt <- -log(runif(n))/h
label(dt) <- 'Follow-up Time'
e <- ifelse(dt <= cens,1,0)
dt <- pmin(dt, cens)
units(dt) <- "Year"
dd <- datadist(age, sex)
options(datadist='dd')
S <- Surv(dt,e)

# categorize age to demonstrate issue
age2 = cut2(age, g = 3)

# fit model using age categories
f = cph(S ~ age2*strat(sex), x = TRUE)

#try to make prediction from model object f
predict(f, newdata = data.frame(age2 = "[43.6,54.6)", sex="Male"))
#> Error in matxv(X, coeff, kint = kint): columns in a (5) must be <= length of b (4)

#try to get X matrix for object f
predict(f, newdata = data.frame(age2 = "[43.6,54.6)", sex="Male"), type = "x")
#>   age2[43.6,54.6) age2[54.6,98.9] age2[11.3,43.6):strat(sex)Male
#> 1               1               0                              0
#>   age2[43.6,54.6):strat(sex)Male age2[54.6,98.9]:strat(sex)Male
#> 1                              1                              0
#> attr(,"strata")
#> [1] sex=Male
#> Levels: sex=Female sex=Male

Created on 2021-09-08 by the reprex package (v1.0.0)

@bdeonovic
Copy link

Bump

such predictions fail with base coxph as well so I'm guessing this particular use case has not been accounted for in either codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants