This repository is a dbt
package containing data extracted from
FiveThirtyEight's data repository.
The package is intended to be used as a way to rapidly load interesting, curated
data sets into your database of choice.
To load data from this package, you'll need to install the package into your dbt project
just like any other package by adding it your packages.yml
file and running dbt deps
.
packages:
- git: "https://github.com/stkbailey/fivethirtyeight-open-data.git"
revision: 0.1.0
Afterwards, you'll need to indicate which projects you'd like to load by specifying the folder
name in the seeds
config block of dbt_project.yml
. (Example below.) The next time you run
dbt seed
, the data will load!
seeds:
fivethirtyeight:
bob_ross:
enabled: true
fandango:
enabled: true
tarantino:
enabled: true
Data in this package are pulled from FiveThirtyEight's data repository, then minimally processed
to makem them compliant with dbt
. This includes, for each project:
- Reformatting the
README.md
file into aschema.yml
file. - Renaming all
csv
files to be<project_name>_<file_name>.csv
. - Trimming large files (of a customizable size).
The code for re-downloading files is found in download_and_process_files.py.
See https://data.fivethirtyeight.com/ for a list of the data and code FiveThirtyEight has published.
Unless otherwise noted, these data sets are available under the Creative Commons Attribution 4.0 International License, and the code is available under the MIT License. If you find this information useful, please let us know.