Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] BQ quota_project defaults to the account/user's environment default #1347

Open
2 tasks done
jcarpenter12 opened this issue Sep 12, 2024 · 3 comments · May be fixed by #1345
Open
2 tasks done

[Bug] BQ quota_project defaults to the account/user's environment default #1347

jcarpenter12 opened this issue Sep 12, 2024 · 3 comments · May be fixed by #1345
Labels
enhancement New feature or request

Comments

@jcarpenter12
Copy link

jcarpenter12 commented Sep 12, 2024

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When a user or service account's default quota_project_id is set to a project other than the execution_project specified in the profiles.yml DBT will use this as the quota_project_id in the underlying request opposed to anything set in the profiles.yml. This is a side effect and will mean that a DBT will fail to connect to BigQuery if the user does not have BigQuery API enabled in the project their credentials default to. It will also mean that certain quota limits are being used up in the default project.

You can see the default quota project_id from the contents of your application_default_credentials file. It should look something like this. A similar structure applies to the contents of service account keys I believe

{
  "account": "",
  "client_id": "xyz.apps.googleusercontent.com",
  "client_secret": "xyz",
  "quota_project_id": $MY_QUOTA_PROJECT,
  "refresh_token": "xyz",
  "type": "authorized_user",
  "universe_domain": "googleapis.com"
}

This issue arose when running a CI/CD pipeline that is defined in a project separate to either the project or execution_project and did not have the BQ API enabled, so it failed to connect.

Expected Behavior

What I think should be happening according to the docs

As this is a side effect the expected behaviour is a bit difficult to define but DBT should either set execution_project as the quota_project_id in the BQ client or allow the user to override it themselves in the profiles.yml.

In my opinion the user should be able to specify a separate optional quota_project to use separate from project and execution project. By not setting it, it will default to what it does now and not change the behaviour. Feature Request here #1343

Steps To Reproduce

  1. Create a GCP quota Project and do not enable the BQ API in it. This is to ensure there is no quota to use for it
  2. Create a GCP data project, enable BQ API and create a dataset called foo
  3. Create a GCP execution project, enable BQ API
  4. Make sure you user has access to all three of the above with the appropriate BQ permissions
  5. Running your local gcloud run the following gcloud config set project $MY_QUOTA_PROJECT (replace with the name of step 1)
  6. Generate your local oauth credentials gcloud auth application-default login
  7. You should then be able to cat the contents of that file using cat ~/.config/gcloud/application_default_credentials. It should look something like the following
{
  "account": "",
  "client_id": "xyz.apps.googleusercontent.com",
  "client_secret": "xyz",
  "quota_project_id": $MY_QUOTA_PROJECT,
  "refresh_token": "xyz",
  "type": "authorized_user",
  "universe_domain": "googleapis.com"
}
  1. Setup a python virtualenv and pip install dbt-core==1.8.6 dbt-bigquery==1.8.2
  2. Create a profiles.yml configuration that contains the following and substitute the names of your GCP projects with those you have created
default:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      schema: foo
      location: europe-west2
      threads: 10
      timeout_seconds: 5000
      priority: interactive
      retries: 1
      project: $MY_DATA_PROJECT
      execution_project: $MY_EXECUTION_PROJECT
  1. Run the following dbt command to run the debug of the connection (update profiles-dir to wherever your profile is located
    dbt debug --connection --profiles-dir .

You should then see a log output like the below. Here you can see the quota project is causing this connection to fail even though it is not specified anywhere in the dbt profile itself and is instead being picked up from the environment.

Relevant log output

13:24:42  dbt was unable to connect to the specified database.
The database returned the following error:

  >Database Error
  BigQuery API has not been used in project $MY_QUOTA_PROJECT before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/bigquery.googleapis.com/overview?project=$MY_QUOTA_PROJECT then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.

Environment

- OS: macOS Ventura 13.2.1
- Python: 3.12.1
- dbt-core: 1.8.6
- dbt-bigquery: 1.8.2

Additional Context

This relates to #1343 a feature request to add quota_project and a potential PR for it here #1345

Docs PR : dbt-labs/docs.getdbt.com#6054

As noted on the #1343 I have not raised a change to a dbt adapter before so please do let me know if I am misunderstanding anything

@jcarpenter12 jcarpenter12 added bug Something isn't working triage labels Sep 12, 2024
@jcarpenter12 jcarpenter12 linked a pull request Sep 12, 2024 that will close this issue
4 tasks
@dataders
Copy link
Contributor

@jcarpenter12 amazing and thorough write up!

What I still need help on is:

  1. what the expected behavior of your reproducible example should be. Like what should be happening?

    Here you can see the quota project is causing this connection to fail even though it is not specified anywhere in the dbt profile itself and is instead being picked up from the environment

  2. disambiguating:
    1. your reproducible example that highlights that something funny is happening when multiple projects are set up
    2. the behavior discovered that workloads aren't properly attributed to quota projects

@jcarpenter12
Copy link
Author

@jcarpenter12 amazing and thorough write up!

What I still need help on is:

  1. what the expected behavior of your reproducible example should be. Like what should be happening?

    Here you can see the quota project is causing this connection to fail even though it is not specified anywhere in the dbt profile itself and is instead being picked up from the environment

  2. disambiguating:

    1. your reproducible example that highlights that something funny is happening when multiple projects are set up
    2. the behavior discovered that workloads aren't properly attributed to quota projects

Thanks for getting back to me so quickly @dataders.

As for 1. I have updated the expected behaviour section with what I believe should be happening according to the docs.

As for 2. you're right it's not very clear. In my example to prove that it is picking up the quota project from the environment I created a new project that does not have the BQ api enabled in it. This way dbt debug would fail the connection because there is no quota it can access but prove that it is incorrectly trying to use the quota project from the environment rather than from the profiles.yml.

If i had set a project in my gcloud using gcloud config set project $MY_QUOTA_PROJECT that did have the BQ api enabled in it the debug command would pass as it would be able to access the quota project. However, this would only disguise the issue as it would appear to be running okay but actually it's attributing the quota usage to the one set in the user's environment.

Please let me know if these aren't suitable and I'll come up with another way to demonstrate the issue as I see it.

@jcarpenter12 jcarpenter12 changed the title [Bug] execution_project is not the project that is billed for the query as outlined in the docs [Bug] execution_project is not the default for all BQ quotas depending on setup Sep 13, 2024
@jcarpenter12 jcarpenter12 changed the title [Bug] execution_project is not the default for all BQ quotas depending on setup [Bug] BQ quota_project defaults to the account/user's environment default Sep 13, 2024
@jcarpenter12
Copy link
Author

Updated the title and contents of the issue. I have been speaking to the maintainers on their slack channel about this and we have managed to narrow down the issue, and the original issue was somewhat misleading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants