Simple ELT Pipelipe which gets data from NY Taxi Trips, transform it and make the information available for futher analysis.
- Project realized for studies purposes, along the course of DataTalksClub - Data Engineering Zoomcamp.
In this project, the creation and management of cloud resources was done with Terraform. The workflow orchestration was managed by Prefect, which coordenates the Python ETL and DBT (Data transformation), along the integrations with Google Cloud Plataform to communicate with cloud services (GCS, BigQuery), and also contain an integration with discord, to notify every time the deploy was runned. The docker images created to containerize the prefect server and prefect agent was pushed to Google Artifact Registry, and then used with Google Compute Engine to setup the compute instance which runs the prefect server and prefect agent respectively. In the end, the data is served on Looker Studio.
- Python
- GCP - Google Cloud Platform
- Infrastructure as Code software (IaC): Terraform
- Data Lake: Google Cloud Storage
- Data Warehouse: BigQuery
- Artifact Registry
- Google Compute Engine
- Containerization: Docker
- Workflow Orchestration: Prefect
- Data Transformation: dbt
- Data Visualization - Looker Studio
- Notifications Webhook: Discord