Skip to content

A standardized data project using the US Election Campaign Finance dataset to create a 'TodoMVC-equivalent' for comparing data processing frameworks.

License

Notifications You must be signed in to change notification settings

DataRecce/TodoFEC-dbt

Repository files navigation

TodoFEC-dbt

This project aligns with the TodoFEC initiative to create a standardized set of data tasks for comparing data processing frameworks. Our focus is on developing dbt models that transform and analyze the U.S. Election Campaign Finance dataset to provide insights into campaign contributions, expenditure patterns, and donor networks.

Quickstart

  • Explore data by Query FEC Data on S3 with DuckDB
  • Once you decide what changes you want to make, you can download the dataset and make dbt model changes.

Query FEC Data on S3 with DuckDB

The FEC data for this project is available as Parquet files in an S3 bucket, allowing direct querying without downloading. You can use DuckDB to query the data directly.

  1. Install duckdb
   pip install duckdb
  1. Open duckdb
   duckdb
  1. Run a Query: Use the following command to query the Parquet file directly from S3
  select count(*) from read_parquet('s3://datarecce-todofec/pac_summary_2024.parquet');

Here are the S3 URIs of available dataset:

s3://datarecce-todofec/all_candidates_2024.parquet
s3://datarecce-todofec/candidate_master_2024.parquet
s3://datarecce-todofec/candidate_committee_linkage_2024.parquet
s3://datarecce-todofec/house_senate_2024.parquet
s3://datarecce-todofec/committee_master_2024.parquet
s3://datarecce-todofec/pac_summary_2024.parquet
s3://datarecce-todofec/contributions_from_committees_to_candidates_2024.parquet
s3://datarecce-todofec/operating_expenditures_2024.parquet

Check out TodoFEC-parser to see how the parquet files are prepared.

Get Ready to Make dbt Model Changes

Fork This Repository

To make and track your changes, first fork this repository to your own GitHub account. This will create a personal copy that you can modify.

  1. Fork the Repository: Click "Fork" at the top of this GitHub page.
  2. Clone Your Fork:
  git clone https://github.com/your-username/TodoFEC-dbt.git
  cd TodoFEC-dbt

System Prequisites

Before you begin you'll need the following on your system:

  • Python >=3.12 (see here)
  • Python Poetry >= 1.8 (see here)
  • NPM >= 7 (see here)
  • git (see here)

Setup dependencies

Install the python dependencies

poetry install

Using the poetry environment

Once installation has completed you can enter the poetry environment.

poetry shell

Running dbt

Once you've updated any models you can run dbt within the poetry environment by simply calling:

dbt run

Visualize with Evidence

Setup Evidence

npm --prefix ./evidence install

Prepare data

dbt seed -t prod   # Optional
dbt build -t prod  # Optional
npm --prefix ./evidence run sources

Launch Evidence

npm --prefix ./evidence run dev

evidence

Validate model changes with Recce

Recce is a data-validation toolkit.

Prepare the environment

Once you've updated models, you can use Recce to validate changes

# Prepare the base environment
git checkout main
dbt seed -t prod --target-path target-base
dbt run -t prod --target-path target-base
dbt docs generate -t prod --target-path target-base

# Prepare the currnt environment
git checkout <feature_branch>
dbt seed
dbt run
dbt docs generate

# Launch Recce
recce server

About

A standardized data project using the US Election Campaign Finance dataset to create a 'TodoMVC-equivalent' for comparing data processing frameworks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •