This repo contains the R project for modeling overnight service occupancy in homeless shelters in the city of Toronto. Note that is R projects is used along with the DBT project with housed the data models for the project.
- Data Extraction: Extract data from Open Data Toronto API. Also extract weather forecast data from AccuWeather API.
- Data Loading: Load raw data into Big Query for further transformation using DBT
- Machine Learning: Extract transformed data from Big Query. Use H2o AutoML to create machine learning models to predict overnight shelter occupancy for the next 5 days based on multiple features.
- Data Reporting: Report on predictions using R related technologies and packages like Quarto, datatable, etc
-
Shelter Data: Toronto Open Data
-
Weather Data: NOAA Big Query Data for historical weather data and AccuWeather API for weather forecast data.
-
DBT: For data transformation and modeling.
-
Google BigQuery: As the data warehouse for storing and querying the dataset.
-
Posit: For data manipulation using multiple packagers like
tidyverse
,data.table
,DT
,DBI
,h2o
. -
H2o AutoML: For building machine learning models
-
Quarto: For reporting
-
/scripts_dev
: Contains development scripts for analysis. -
/scripts_prod
; Contains production scripts that run on schedule. -
/functions
: Contains functions to modularize code. -
/app
: Contains scripts for shiny app.
Coming soon...