Skip to content

My submission for the Capstone project for DataTalks Zoomcamp cohort of 2023

Notifications You must be signed in to change notification settings

Zesky665/DEZC_2023_Capstone

Repository files navigation

Collecting data on the cost of spot instances across the major Cloud Providers (AWS, Azure)

Objectives

  • Create a DWH for staring pricing data from various cloud providers.
  • Create orchestration that will pull data from the sources periodicaly.
  • Create a dashboard that will show relevant metrics.
  • Create a workflow to automaticaly deploy all of this.

Technology used

  • Cloud: AWS
  • Containerization: Docker with Docker-Compose
  • Infrastructure: Terraform
  • DWH: Redshift
  • Orchestration: Prefect
  • Data Transformation: Pandas
  • Data Visualization: Metabase

Setup instructions

If you want to run it locally. Local setup

If you want to run it with GitHub Actions GitHub Setup

DWH Database Schema

Data diagram

Data Sources:

Tech Diagram

Tech diagram

Dashboard

Link to dashboard

Tech diagram

Insights

  • Lower powered AWS spot instances are often as costly as on-demand. Even when available the savings are much less than typically advertized.
  • Bigger instances come with bigger discounts. For example: m5a.large spot instances are 44% cheapter than on-demand. a1.medium spot instances are the same price as on-demand.

To-Do

  • Add GCP Data.
  • Add persistance for metabase.
  • Add data quality tests to dbt flow.

Acknowledgements

Thanks to the instructors:

Thanks to collegues:

  • Anna Geller, her articles on Prefect DataOps have been a huge infuence.
  • Andy Nelson, for telling me about ZoomCamp and generaly being a great mentor.
  • Matt Little, his articles on Terraform and AWS were what made this entire project possible.

Contact information

About

My submission for the Capstone project for DataTalks Zoomcamp cohort of 2023

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published