This project aligns with the TodoFEC initiative to create a standardized set of data tasks for comparing data processing frameworks. Our focus is on developing dbt models that transform and analyze the U.S. Election Campaign Finance dataset to provide insights into campaign contributions, expenditure patterns, and donor networks.
- Explore data by Query FEC Data on S3 with DuckDB
- Once you decide what changes you want to make, you can download the dataset and make dbt model changes.
The FEC data for this project is available as Parquet files in an S3 bucket, allowing direct querying without downloading. You can use DuckDB to query the data directly.
- Install duckdb
pip install duckdb
- Open duckdb
duckdb
- Run a Query: Use the following command to query the Parquet file directly from S3
select count(*) from read_parquet('s3://datarecce-todofec/pac_summary_2024.parquet');
Here are the S3 URIs of available dataset:
s3://datarecce-todofec/all_candidates_2024.parquet
s3://datarecce-todofec/candidate_master_2024.parquet
s3://datarecce-todofec/candidate_committee_linkage_2024.parquet
s3://datarecce-todofec/house_senate_2024.parquet
s3://datarecce-todofec/committee_master_2024.parquet
s3://datarecce-todofec/pac_summary_2024.parquet
s3://datarecce-todofec/contributions_from_committees_to_candidates_2024.parquet
s3://datarecce-todofec/operating_expenditures_2024.parquet
Check out TodoFEC-parser to see how the parquet files are prepared.
To make and track your changes, first fork this repository to your own GitHub account. This will create a personal copy that you can modify.
- Fork the Repository: Click "Fork" at the top of this GitHub page.
- Clone Your Fork:
git clone https://github.com/your-username/TodoFEC-dbt.git
cd TodoFEC-dbt
Before you begin you'll need the following on your system:
Install the python dependencies
poetry install
Once installation has completed you can enter the poetry environment.
poetry shell
Once you've updated any models you can run dbt within the poetry environment by simply calling:
dbt run
npm --prefix ./evidence install
dbt seed -t prod # Optional
dbt build -t prod # Optional
npm --prefix ./evidence run sources
npm --prefix ./evidence run dev
Recce is a data-validation toolkit.
Once you've updated models, you can use Recce to validate changes
# Prepare the base environment
git checkout main
dbt seed -t prod --target-path target-base
dbt run -t prod --target-path target-base
dbt docs generate -t prod --target-path target-base
# Prepare the currnt environment
git checkout <feature_branch>
dbt seed
dbt run
dbt docs generate
# Launch Recce
recce server