You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is a place to record some notes about the response variable, as well as a place to discuss and record some of the decisions made when constructing the model.
For example, the current model structure is a binary classifier. However, each outbreak in the data provided by WAHIS has both an outbreak start date and an outbreak end date. Sometimes, that interval can be very long. For example, an outbreak of 12 cases was reported over an interval of 398 days. That's only ~0.03 cases per day across the interval.
One approach could be to use the interval as an offset. That would make sense if intervals reflected differing amounts of sampling effort, but instead, the interval is a characteristic of how that particular outbreak played out or was reported. Intervals here don't represent differences in 'exposure time' so I don't think it's appropriate to use them to standardize the modeled rates to a per-unit time basis.
The simplest solution would be to forecast whether an outbreak will start during the forecast interval. That's easy to convert to a binary value and has a more natural interpretation.
The text was updated successfully, but these errors were encountered:
# For each model selected date, determine which polygons had an outbreak in the following 30 days
rvf_points_polygon_dates <- map_dfr(model_dates_selected, function(model_date){
day_diff <- rvf_points_polygon$outbreak_start_date - model_date
rvf_points_polygon[which(day_diff >= 1 & day_diff <= 30),] |>
mutate(date = model_date)
})
This issue is a place to record some notes about the response variable, as well as a place to discuss and record some of the decisions made when constructing the model.
For example, the current model structure is a binary classifier. However, each outbreak in the data provided by WAHIS has both an outbreak start date and an outbreak end date. Sometimes, that interval can be very long. For example, an outbreak of 12 cases was reported over an interval of 398 days. That's only ~0.03 cases per day across the interval.
One approach could be to use the interval as an offset. That would make sense if intervals reflected differing amounts of sampling effort, but instead, the interval is a characteristic of how that particular outbreak played out or was reported. Intervals here don't represent differences in 'exposure time' so I don't think it's appropriate to use them to standardize the modeled rates to a per-unit time basis.
The simplest solution would be to forecast whether an outbreak will start during the forecast interval. That's easy to convert to a binary value and has a more natural interpretation.
The text was updated successfully, but these errors were encountered: