Data ingestion is done using Python and can be found under load_data folder. JSON events in "events.jsonl.bz2" are loaded as pandas dataframe. Then the data is ingested to Redshift Database through "load_dataset_to_redshift.py"
Data Transformation is done using dbt and can be found under transformed_data/models folder.
- staging: Intermediate data transformations are created as views
- marts: Final analytics tables are created as a_device_session.sql, b_school_session.sql, c_device_usage_history.sql, and d_master_school.sql files for analytics purposes
From the four final tables, some basic descriptive statistics were derived.
dbt_run_artifacts directory is managed by Github Actions Workflow so do not modify. This directory stores the dbt state file.
Suganthi Jaganathan (September, 2023 -) - @SuganthiJagan
Suganthi Jaganathan - @SuganthiJagan