This repo contains all the data pipeline code shown at SMX Advanced EU 2023. It contains implementation details for four data products, as shown in the session:
- a 404 live alert on GA4 data, executed via a cloud function
- a sitemap monitoring script, based on advertools
- a custom SEO crawler, based on advertools
- a webvitals monitoring script
For an introduction, reference this SMX Advanced deck: https://www.slideshare.net/ChristopherGutknecht/
- You need a Google Cloud Account with a valid billing account
- You need to know how to deploy Python cloud function in GCP
- You need a working knowledge of SQL (or a data team) and basic BigQuery skills
- For an introduction to the dbt framework, see here: https://www.getdbt.com/blog/what-exactly-is-dbt/
- To learn the necessary fundamentals of dbt, see this course: https://courses.getdbt.com/courses/fundamentals
- To get startetd with dbt cloud (recommended for beginners), see here: https://docs.getdbt.com/docs/get-started/dbt-cloud-features
Check the webvitals_to_bigquery Folder and the README.md
file.