Swap dataproc `batch_id` declaration to model config #804

nickozilla · 2023-06-29T22:11:11Z

resolves #671

Description

My initial implementation here: #727 does allow users to to set the batch_id for dataproc serverless models in the profile & this does apply onto a python model, but this wouldn't work for any projects that have more than one python model (which kinda defeats the point of including it at all). As the batch_id parameter needs to be unique, and this is not achievable via a profile level definition.

I've created this new implementation for declaring batch_id at the model level locally, which would be preferred.

Checklist

I have read the contributing guide and understand what's expected of me
I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
I have run changie new to create a changelog entry

nickozilla · 2023-06-30T08:31:14Z

@dbeatty10 @colin-rogers-dbt - I realised that the implementation I merged in didn't actually address the desired functionality - this PR fixes it

dbt/adapters/bigquery/python_submissions.py

.changes/unreleased/Fixes-20230630-092618.yaml

dbeatty10 · 2023-07-13T16:25:10Z

@nickozilla we're doing release candidates (RC) for dbt-core v1.6 and their associated adapters right now.

So that dbt-bigquery is in a solid state for the RC for 1.6 and to avoid the issue described in #822, we're going to use #826 to revert the original PR in #727.

Then going forward, this PR (#804) can be used as the new implementation of the batch_id feature with the idea of it being ready for inclusion in dbt v1.7.

dbeatty10 · 2023-07-17T13:51:10Z

@nickozilla could you add at least one test case that covers this new behavior? Ideally, it would include at least two different dbt Python models each with a different custom batch_id.

Let us know if you need any help finding an existing test as a template to follow.

.changes/unreleased/Fixes-20230630-092618.yaml

Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com>

dbeatty10 · 2023-07-31T13:04:11Z

@nickozilla could reply to dbt-labs/docs.getdbt.com#3718 with some code examples that we can include in the documentation?

nickozilla · 2023-08-01T22:01:49Z

Hi @dbeatty10 sorry for delay getting back to you. I've settup the dev environment locally & added a test into tests/functional/adapter/test_python_model.py but the infra we're using doesn't fit well in the test framework - mainly the lack of support for defining the params we use in our profile (most of which are not optional):

target: dev
  outputs:
    dev:
      dataset: dev
      job_execution_timeout_seconds: 2000
      job_retries: 1
      location: EU
      method: oauth
      priority: interactive
      project: "{{ env_var('WAREHOUSE_ANALYTICS_PROJECT_ID') }}"
      threads: 8
      type: bigquery
      dataproc_region: europe-west1
      gcs_bucket: "{{ env_var('PYTHON_DBT_MODELS_BUCKET') }}"
      dataproc_batch:
        environment_config:
          execution_config:
            service_account: "dbt-py@{{ env_var('WAREHOUSE_ANALYTICS_PROJECT_ID') }}.iam.gserviceaccount.com"
            subnetwork_uri: "projects/{{ env_var('NETWORK_PROJECT_ID') }}/regions/europe-west1/subnetworks/dataproc"
            network_tags: ["dataproc-ingress"]
            staging_bucket: "{{ env_var('PYTHON_DBT_STAGING_BUCKET') }}"
        pyspark_batch:
          jar_file_uris:
            [
              "gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.29.0.jar",
            ]
        runtime_config:
          container_image: "{{ env_var('PYTHON_DBT_IMAGESTORE') }}:{{ env_var('PYTHON_DBT_TAG') }}"

AFAICT the test framework assumes python models are using a cluster on the default project network and gives no extra support for the parameters we're using in the test.env file. Is there a way I can set these parameters for the tests? Otherwise I can try adding in some tests and having the CI on this PR run the checks, but that's not ideal.

I've also included a model yaml configuration example in the issue you've linked, and tested it locally, see below.

dbeatty10 · 2023-08-02T17:35:04Z

@nickozilla no worries!

If you commit your changes to tests/functional/adapter/test_python_model.py, we can try running them against the CI we have set up and see if it works or not 🤞 That might reduce the burden of resolving the mismatch you mentioned.

nickozilla · 2023-08-04T17:34:58Z

@dbeatty10 I've added the tests as requested - though I'm not sure I'm using the assert len() correctly

Resolved

swap batch_id declaration to model config

1e13b6d

cla-bot bot added the cla:yes label Jun 29, 2023

address changie req, fix python submission

ef3d398

nickozilla marked this pull request as ready for review June 30, 2023 08:27

nickozilla requested a review from a team as a code owner June 30, 2023 08:27

nickozilla requested a review from nathaniel-may June 30, 2023 08:27

Merge branch 'main' into python-dbt-model-batch-id

711e45a

dbeatty10 added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Jul 12, 2023

dbeatty10 mentioned this pull request Jul 12, 2023

[ADAP-696] [Pre-regression] batch_id needs to be a model config instead of in the profile #822

Closed

2 tasks

colin-rogers-dbt reviewed Jul 12, 2023

View reviewed changes

dbt/adapters/bigquery/python_submissions.py Outdated Show resolved Hide resolved

dbeatty10 reviewed Jul 12, 2023

View reviewed changes

.changes/unreleased/Fixes-20230630-092618.yaml Show resolved Hide resolved

Update bug that is being resolved

222ace6

dbeatty10 mentioned this pull request Jul 12, 2023

dbt-bigquery batch_id config dbt-labs/docs.getdbt.com#3718

Closed

1 task

nickozilla added 2 commits July 16, 2023 14:45

merge master

4da3436

implement dbeatty's suggestion

6233c52

dbeatty10 previously requested changes Jul 18, 2023

View reviewed changes

.changes/unreleased/Fixes-20230630-092618.yaml Outdated Show resolved Hide resolved

nickozilla and others added 2 commits July 23, 2023 14:10

Update .changes/unreleased/Fixes-20230630-092618.yaml

01ff3ca

Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com>

Merge branch 'main' into python-dbt-model-batch-id

b7ef6ce

colin-rogers-dbt approved these changes Jul 28, 2023

View reviewed changes

Add 2 tests

8b57e2a

nickozilla requested review from dbeatty10 and colin-rogers-dbt August 8, 2023 20:52

Merge branch 'main' into python-dbt-model-batch-id

facaaf4

Merge branch 'main' into python-dbt-model-batch-id

fab30b3

mikealfare added the ok to test label Aug 11, 2023

mikealfare and others added 2 commits August 11, 2023 12:31

Merge branch 'main' into python-dbt-model-batch-id

c6dcc2b

Merge branch 'main' into python-dbt-model-batch-id

feebb36

colin-rogers-dbt enabled auto-merge (squash) August 11, 2023 21:06

Merge branch 'main' into python-dbt-model-batch-id

f26c253

colin-rogers-dbt merged commit 3510f76 into dbt-labs:main Aug 11, 2023
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swap dataproc `batch_id` declaration to model config #804

Swap dataproc `batch_id` declaration to model config #804

nickozilla commented Jun 29, 2023 •

edited by dbeatty10

Loading

nickozilla commented Jun 30, 2023

dbeatty10 commented Jul 13, 2023

dbeatty10 commented Jul 17, 2023

dbeatty10 commented Jul 31, 2023

nickozilla commented Aug 1, 2023 •

edited

Loading

dbeatty10 commented Aug 2, 2023

nickozilla commented Aug 4, 2023

Swap dataproc batch_id declaration to model config #804

Swap dataproc batch_id declaration to model config #804

Conversation

nickozilla commented Jun 29, 2023 • edited by dbeatty10 Loading

Description

Checklist

nickozilla commented Jun 30, 2023

dbeatty10 commented Jul 13, 2023

dbeatty10 commented Jul 17, 2023

dbeatty10 commented Jul 31, 2023

nickozilla commented Aug 1, 2023 • edited Loading

dbeatty10 commented Aug 2, 2023

nickozilla commented Aug 4, 2023

Swap dataproc `batch_id` declaration to model config #804

Swap dataproc `batch_id` declaration to model config #804

nickozilla commented Jun 29, 2023 •

edited by dbeatty10

Loading

nickozilla commented Aug 1, 2023 •

edited

Loading