The goal of covid19R is to provide a single package that allows users to access all of the tidy covid-19 datasets collected by data packages that implement the covid19R tidy data standard. It provides access to multiple data sets that meet a tidy data standard.
To learn more abou the Covid19R project, check our extensive documentation about data standards, how to get your data added to this list, and more.
You can install the development version from github with:
remotes::install_github("covid19r/covid19r")
To see what datasets are available, use get_covid19_data_info()
library(covid19R)
data_info <- get_covid19_data_info()
head(data_info) %>% knitr::kable()
data_set_name | package_name | function_to_get_data | data_details | data_url | license_url | data_types | location_types | spatial_extent | has_geospatial_info | get_info_passing | refresh_status | last_refresh_update |
---|---|---|---|---|---|---|---|---|---|---|---|---|
covid19nytimes_states | covid19nytimes | refresh_covid19nytimes_states | Open Source data from the New York Times on distribution of confirmed Covid-19 cases and deaths in the US States. For more, see https://www.nytimes.com/article/coronavirus-county-data-us.html or the readme at https://github.com/nytimes/covid-19-data. | https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv | https://github.com/nytimes/covid-19-data/blob/master/LICENSE | cases_total, deaths_total | state | country | FALSE | TRUE | Passed | 2020-05-04 16:08:36 |
covid19nytimes_counties | covid19nytimes | refresh_covid19nytimes_counties | Open Source data from the New York Times on distribution of confirmed Covid-19 cases and deaths in the US by County. For more, see https://www.nytimes.com/article/coronavirus-county-data-us.html or the readme at https://github.com/nytimes/covid-19-data. | https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv | https://github.com/nytimes/covid-19-data/blob/master/LICENSE | cases_total, deaths_total | state | country | FALSE | TRUE | Passed | 2020-05-04 16:08:39 |
covid19france | covid19france | refresh_covid19france | Open Source data from opencovid19-fr on distribution of confirmed Covid-19 cases and deaths in the US States. For more, see https://github.com/opencovid19-fr/data. | https://raw.githubusercontent.com/opencovid19-fr/data/master/dist/chiffres-cles.csv | https://github.com/opencovid19-fr/data/blob/master/LICENSE | confirmed, dead, icu, hospitalized, recovered, discovered | county, region, country, overseas collectivity | country | FALSE | TRUE | Passed | 2020-05-04 16:08:47 |
CanadaC19_cases | CanadaC19 | refresh_CanadaC19_cases | Open Source data from multiple public reporting data throughout Canada. For more, see https://github.com/ishaberry/Covid19Canada. | https://raw.githubusercontent.com/ishaberry/Covid19Canada/master/cases.csv | https://github.com/debusklaneml/CanadaC19/blob/master/LICENSE | cases_new | state | state | FALSE | TRUE | Passed | 2020-05-04 16:08:48 |
covid19us | covid19us | refresh_covid19us | Open Source data from COVID Tracking Project on the distribution of Covid-19 cases and deaths in the US. For more, see https://github.com/opencovid19-fr/data. | https://covidtracking.com/api | https://github.com/aedobbyn/covid19us/blob/master/LICENSE.md | positive, negative, pending, hospitalized_currently, hospitalized_cumulative, in_icu_currently, in_icu_cumulative, on_ventilator_currently, on_ventilator_cumulative, recovered, death, hospitalized, total, total_test_results, death_increase, hospitalized_increase, negative_increase, positive_increase, total_test_results_increase | state | country | FALSE | TRUE | Passed | 2020-05-04 16:08:50 |
Once you have figured out what dataset you want, you can access it with
get_covid19_dataset()
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
nytimes_states <- get_covid19_dataset("covid19nytimes_states")
#> Parsed with column specification:
#> cols(
#> date = col_date(format = ""),
#> location = col_character(),
#> location_type = col_character(),
#> location_code = col_character(),
#> location_code_type = col_character(),
#> data_type = col_character(),
#> value = col_double()
#> )
nytimes_states %>%
filter(date == max(date)) %>%
filter(data_type == "cases_total") %>%
arrange(desc(value)) %>%
head()
#> # A tibble: 6 x 7
#> date location location_type location_code location_code_t… data_type
#> <date> <chr> <chr> <chr> <chr> <chr>
#> 1 2020-05-03 New York state 36 fips_code cases_to…
#> 2 2020-05-03 New Jer… state 34 fips_code cases_to…
#> 3 2020-05-03 Massach… state 25 fips_code cases_to…
#> 4 2020-05-03 Illinois state 17 fips_code cases_to…
#> 5 2020-05-03 Califor… state 06 fips_code cases_to…
#> 6 2020-05-03 Pennsyl… state 42 fips_code cases_to…
#> # … with 1 more variable: value <dbl>
While many data sets have their own unique additional columns (e.g., Latitude, Longitude, population, etc.), all datasets have the following columns and are arranged in a long format:
- date - The date in YYYY-MM-DD form
- location - The name of the location as provided by the data source.
The counties dataset provides county and state. They are combined
and separated by a
,
, and can be split bytidyr::separate()
, if you wish. - location_type - The type of location using the covid19R controlled vocabulary. Nested locations are indicated by multiple location types being combined with a `_
- location_code - A standardized location code using a national or international standard. In this case, FIPS state or county codes. See https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code and https://en.wikipedia.org/wiki/FIPS_county_code for more
- location_code_type The type of standardized location code being
used according to the covid19R controlled vocabulary. Here we use
fips_code
- data_type - the type of data in that given row. Includes
total_cases
andtotal_deaths
, cumulative measures of both. - value - number of cases of each data type
The location_type
, location_code_type
, and data_type
from datasets
and spatial_extent
from the data info table all have their own
controlled vocabularies. Others might be introduced as the collection of
packages matures. To see the possible values of a standardized
vocabulary, use get_covid19_controlled_vocab()
get_covid19_controlled_vocab("location_type") %>%
knitr::kable()
location_type | description |
---|---|
continent | continental scale |
country | a country with soverign borders |
state | a spatial area inside that country such as a state, province, canton, etc. |
county | a spatial area demarcated within a state |
city | a single municipality - the smallest spatial grain of government in a country |
canton | the cantons of Switzerland and Principality of Liechtenstein (FL) |