Skip to content

Commit

Permalink
Fix ascertainment vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
pratikunterwegs committed Jul 31, 2023
1 parent 9ab7833 commit 89b3f1b
Showing 1 changed file with 23 additions and 9 deletions.
32 changes: 23 additions & 9 deletions vignettes/estimate_ascertainment.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Estimating the proportion of cases that are reported during an outbreak"
title: "Estimating the proportion of cases that are ascertained during an outbreak"
output:
bookdown::html_vignette2:
fig_caption: yes
Expand All @@ -9,7 +9,7 @@ pkgdown:
bibliography: resources/library.json
link-citations: true
vignette: >
%\VignetteIndexEntry{Estimating the proportion of cases that are reported during an outbreak}
%\VignetteIndexEntry{Estimating the proportion of cases that are ascertained during an outbreak}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
Expand Down Expand Up @@ -43,9 +43,18 @@ We wish to estimate one of the typical severity quantities used in epidemiology
* A time-series of cases, hospitalisations or some other proxy for infections over time;
* A time-series of deaths;
* A delay distribution, describing the probability an individual will die $t$ days after they were initially infected.

The first two elements are expected be included in a dataframe with the columns "dates", "cases", and "deaths"; see below for examples.

The delay distribution is expected to be specified as an object of the class `<epidist>` from the package [_epiparameter_](https://epiverse-trace.github.io/epiparameter/).
:::

This vignette shows how to use the `estimate_reporting()` function in _cfr_ to estimate the proportion of cases, infections or hospitalisations ascertained, and consequently, the extent of under-ascertainment or under-reporting.
Note that under-ascertainment may be due to a number of reasons (see above), while under-reporting is due to cases not being reported (such as due to a lack of testing capacity).
Users should take care to use the correct terminology for their specific situation.

However, we have named the function `estimate_reporting()` as we expect that users will be most interested in situations where testing capacity is itself the limiting factor in ascertainment.
This was the case during the Covid-19 pandemic before testing capacity was substantially increased, and is expected to be the case for a future pandemic caused by a novel pathogen.

First load _cfr_ and packages to access and plot data.

Expand All @@ -68,13 +77,12 @@ library(ggplot2)

The function `estimate_reporting()` from the _cfr_ package estimates the proportion of cases, infections, hospitalisations -- or whichever proxy for infections is provided -- which have been ascertained.

<!-- The method used within this function extends the methods outlined in the previous vignettes about estimating the severity during an ongoing outbreak and measuring how the severity changes over time. -->

The methods are based on the @nishiura2009 to estimate severity, and are extended by combining the resulting severity estimates with an assumed baseline severity estimate.
The methods are based on @nishiura2009 to estimate severity, and are extended by combining the resulting severity estimates with an assumed baseline severity estimate.

The proportion of cases, infections or other quantity provided that have been ascertained is given by the ratio of the (assumed) true baseline severity estimate, to the delay-adjusted severity estimate.

The delay-adjusted severity estimates can be calculated using either the `estimate_static()` or the `estimate_time_varying()` functions.
The delay-adjusted severity estimates can be calculated using either the `estimate_static()` or the `estimate_time_varying()` functions.
See the vignettes on [estimating a static measure of disease severity](estimate_static_severity.html) and [estimating a time-varying measure of disease severity](estimate_time_varying_severity.html), respectively, for more details on each of these functions.

### Preparing the raw data

Expand All @@ -97,6 +105,9 @@ df_covid_uk <- select(
df_covid_uk, date,
cases = new_cases, deaths = new_deaths
)
# view the data format
df_covid_uk
```

We then subset the data to focus on just the first few months of the outbreak.
Expand Down Expand Up @@ -167,10 +178,13 @@ We use the `estimate_reporting()` function within the _cfr_ package to calculate

The function includes a `type` argument, which determines whether `estimate_static()` or `estimate_time_varying()` is used to estimate the delay-adjusted severity of the disease.

The `severity_baseline` argument in the `estimate_reporting()` determines the denominator in the resulting under-ascertainment calculation.
The ascertainment rate is calculated as the disease severity calculated from the data, divided by the 'known' disease severity; this is expected to be known or assumed from our best knowledge of the pathology of the disease.

This known disease severity is passed to the `severity_baseline` argument in `estimate_reporting()`, and forms the denominator in the resulting under-ascertainment calculation.

We assume that the 'true' CFR of Covid-19 is 0.014.

The other arguments are the same as those found in the `estimate_time_varying()`.
The other arguments are the same as those found in `estimate_time_varying()`.

```{r }
df_reporting_static <- estimate_reporting(
Expand Down Expand Up @@ -232,7 +246,7 @@ df_covid <- select(
)
```

We adopt a data science approach to apply the `estimate_reporting()` function across data grouped by country.
We adopt data science tools to iteratively apply the `estimate_reporting()` function across data grouped by country.
We refer the user to the book [R for Data Science](https://r4ds.had.co.nz/) for a better explanation of some of the code used here, including from the packages in [the Tidyverse](https://www.tidyverse.org/).


Expand Down

0 comments on commit 89b3f1b

Please sign in to comment.