Resources:

Welcome to my Session for "Building data quality with stakeholders"

1. INTRO. Data quality is like a campfire 🔥

We want people to stay and spend time on data quality... but HOW?

How do people gather round a campfire? Marshmallows!

Marshmallows:
- Data events that stakeholders really care about:
- Example. Anomalies on leading indicators, which they don't have to monitor
BUT we need to mix in healthy snacks = boring tests like unique and not_null

2. POLL. Let's have a Live poll to get know you!

Link: https://www.menti.com/alxmw7icd8ks Code: 4553 5553

3. DEMO. Let's dive into some examples!

Types of heavily used data tests @ Bergzeit 🏔️

#	Test type	Example use cases	Stakeholder interaction
1	Live alerts	404 errors	High
2	Volume changes	Googlebot crawl volume	Medium
3	Date completeness	Missing dates	Low
4	dbt generic tests	not_null, unique	None

Live alerts demo:

See recording: https://www.linkedin.com/events/buildingdataqualitywithyourstak7243873241482579968/comments/

Our daily data quality alert forwarding pipeline @ Bergzeit 🏔️

We use dbt cloud to manage models and run tests
We store the test failures with an on-run-end macro store_test_results in BigQuery
A PubSub topic Listens on table updates with from Logs explorer sink and and trigger a cloud function
A post_dbt_test_result cloud function queries the test_results dbt model, converts the dataframe to HTML and sends it to MS teams

This is how ChatGPT visualizes the process: :-D

How we resolve data tests with our stakeholders

Use dbt Explore to get the compiled code of the failed test and run the query
For stakeholders, provide additional Looker Studio Dashboards for Data Visualization
Rotate daily first response duty for test failure among data team
- First time failure? Threshold too strict? Deduplication via a window function?
- Is the alert meaningful? Is there an adequate business response, or nothing to be done?
Keep improving model description to 1) provide sufficient context, 2) describe worst case Scenarios und 3) specific resolution steps

Best practices for improving data test resolution and collaboration

Route different tests into differente channels (Teams, Slack) using tags, either on model and test level

  - dbt_expectations.expect_table_row_count_to_be_between:
      # The three cloud schedulder jobs for DACH query 350 URLs each, thus a total min amount is expected
      min_value: 900
      row_condition: "date = current_date()" # (Optional)
      strictly: false
      config:
        severity: warn
        tags: ["analytics-alerts"]

use owner tags on all models

  - name: stg_gsc_inspection_logs
    description: >
      This model lists the search console inspection logs for a list daily tested URLs. 
      See source description for more details
    meta:
      owner: "@Chris G"

use group tags for all folders as fallback for model ownership

census_syncs:
  +group: data_team

channel_attribution:
  +group: customer_acquisition

Resources:

dbt expectations: https://github.com/calogica/dbt-expectations

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
README.md		README.md
check_live_alerts.py		check_live_alerts.py
cloud_components_chatgpt.png		cloud_components_chatgpt.png
post_dbt_test_results.py		post_dbt_test_results.py
store_test_failures.sql		store_test_failures.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to my Session for "Building data quality with stakeholders"

1. INTRO. Data quality is like a campfire 🔥

2. POLL. Let's have a Live poll to get know you!

3. DEMO. Let's dive into some examples!

Types of heavily used data tests @ Bergzeit 🏔️

Live alerts demo:

Our daily data quality alert forwarding pipeline @ Bergzeit 🏔️

How we resolve data tests with our stakeholders

Best practices for improving data test resolution and collaboration

Resources:

About

Releases

Packages

Languages

ChrisGutknecht/data-quality-with-stakeholders-and-dbt

Folders and files

Latest commit

History

Repository files navigation

Welcome to my Session for "Building data quality with stakeholders"

1. INTRO. Data quality is like a campfire 🔥

2. POLL. Let's have a Live poll to get know you!

3. DEMO. Let's dive into some examples!

Types of heavily used data tests @ Bergzeit 🏔️

Live alerts demo:

Our daily data quality alert forwarding pipeline @ Bergzeit 🏔️

How we resolve data tests with our stakeholders

Best practices for improving data test resolution and collaboration

Resources:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages