Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-945] [Bug] submission_method from dbt profile not being applied to dbt Python models #967

Open
2 tasks done
gbmarc1 opened this issue Oct 12, 2023 · 3 comments
Open
2 tasks done
Labels
bug Something isn't working help_wanted Extra attention is needed

Comments

@gbmarc1
Copy link

gbmarc1 commented Oct 12, 2023

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I have the following profile. I want a job to be created in the provided cluster name but it always end up as a serverless batch.

ml:
  target: dev
  outputs:
    dev: &dev_config
      type: bigquery
      dataset: "{{ env_var('USER') }}"
      project: shopify-ml-adhoc
      priority: interactive
      method: oauth
      location: US
      job_execution_timeout_seconds: 600
      job_retries: 1
      threads: 2
      submission_method: cluster
      dataproc_region: us-central1
      gcs_bucket: ml-adhoc-dataproc-jobs
      dataproc_cluster_name: ml-adhoc-dataproc-us-central1

This is the model. If I uncomment the dbt.config it works properly. But I want this config in the profile not in the model itself.

def model(dbt, session):
    # dbt.config(
    #     submission_method="cluster",
    #     dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    # )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

Expected Behavior

The profile config is respected and the job is executed in the cluster.

Steps To Reproduce

dbt run

Relevant log output

dbt run --models nsfw
15:50:47  Running with dbt=1.6.6
15:50:47  Registered adapter: bigquery=1.6.7
15:50:47  Unable to do partial parsing because profile has changed
15:50:48  Found 5 models, 12 tests, 7 sources, 0 exposures, 0 metrics, 661 macros, 0 groups, 0 semantic models
15:50:48  
15:50:50  Concurrency: 2 threads (target='dev')
15:50:50  
15:50:50  1 of 2 START sql table model mab_nsfw.multi_label_v1 ........................... [RUN]
15:50:50  2 of 2 START python table model mab_nsfw.multi_label_v2 ........................ [RUN]
15:50:54  1 of 2 OK created sql table model mab_nsfw.multi_label_v1 ...................... [CREATE TABLE (84.1k rows, 10.5 MiB processed) in 4.40s]

Environment

- OS: macos
- Python: 3.11.1
- dbt-core: 1.6.6
- dbt-bigquery: 1.6.7

Additional Context

No response

@gbmarc1 gbmarc1 added bug Something isn't working triage labels Oct 12, 2023
@github-actions github-actions bot changed the title submission_method ignored in profile (dbt-bigquery) [ADAP-945] submission_method ignored in profile (dbt-bigquery) Oct 12, 2023
@dbeatty10
Copy link
Contributor

Thanks for reporting this @gbmarc1

It sounds like this didn't work for you:

def model(dbt, session):
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

But this did work:

def model(dbt, session):
    dbt.config(
        submission_method="cluster",
        dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

To help troubleshoot

Did you happen to try either of these as well? This could help nail down where the missing piece(s) might be.

Configuring submission_method only:

def model(dbt, session):
    dbt.config(
        submission_method="cluster",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

Or configuring dataproc_cluster_name only:

def model(dbt, session):
    dbt.config(
        dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

@gbmarc1
Copy link
Author

gbmarc1 commented Oct 12, 2023

Hello,
Thanks for looking at this! :)

Seems the profile's submission_method get ignored.

  • Configuring submission_method only 👍
  • Configuring dataproc_cluster_name only 👎

@dbeatty10 dbeatty10 changed the title [ADAP-945] submission_method ignored in profile (dbt-bigquery) [ADAP-945] [Bug] submission_method from dbt profile not being applied to dbt Python models Oct 12, 2023
@dbeatty10
Copy link
Contributor

Thanks @gbmarc1 -- that gives us the info we need 👍

Acceptance criteria

As noted in the original issue, dbt should use the cluster submission method (rather than serverless) when using the following project files:

profiles.yml

ml:
  target: dev
  outputs:
    dev: &dev_config
      type: bigquery
      dataset: "{{ env_var('USER') }}"
      project: shopify-ml-adhoc
      priority: interactive
      method: oauth
      location: US
      job_execution_timeout_seconds: 600
      job_retries: 1
      threads: 2
      submission_method: cluster
      dataproc_region: us-central1
      gcs_bucket: ml-adhoc-dataproc-jobs
      dataproc_cluster_name: ml-adhoc-dataproc-us-central1

models/my_model

def model(dbt, session):
    dbt.config(
        dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

Relevant code

@dbeatty10 dbeatty10 removed the triage label Oct 12, 2023
@martynydbt martynydbt added the help_wanted Extra attention is needed label Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help_wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants