Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-1007] [Feature] Cast dataproc batch runtime_config properties to strings #1001

Open
2 tasks done
jimmyshah opened this issue Nov 6, 2023 · 6 comments
Open
2 tasks done
Labels
enhancement New feature or request good_first_issue Good for newcomers

Comments

@jimmyshah
Copy link

jimmyshah commented Nov 6, 2023

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

While trying to configure Dataproc Serverless jobs via profile.yml, I keep getting an error unless I use quotes around each key-value pair, which doesn't match dbt's documentation.

What works for me:

runtime_config:
          properties:
           'spark.executor.instances': '4'
            'spark.driver.memory': '4g'

Expected Behavior

According to dbt's documentation, Dataproc Serverless jobs can be configured in the dbt profiles.yml file, without quotes. In the example provided on the site, runtime_config properties can be specified in yaml as such:

runtime_config:
          properties:
            spark.executor.instances: 3
            spark.driver.memory: 1

Steps To Reproduce

  1. You have dbt python models already in your project
  2. Add runtime_config properties for serverless jobs without quotes
  3. Run your dbt-py model

Relevant log output

Unable to parse dataproc_batch as valid batch specification. See https://cloud.google.com/dataproc-serverless/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.Batch. Failed to parse runtime_config field: Failed to parse properties field: expected string or bytes-like object..

Environment

- OS: Mac OS Sonoma 14.1
- Python: 3.9.6
- dbt-core: 1.6.2
- dbt-bigquery: 1.6.4

Additional Context

No response

@jimmyshah jimmyshah added bug Something isn't working triage labels Nov 6, 2023
@github-actions github-actions bot changed the title [Bug] Dataproc batch runtime config properties require quotes [ADAP-1007] [Bug] Dataproc batch runtime config properties require quotes Nov 6, 2023
@dbeatty10
Copy link
Contributor

Thanks for reporting this @jimmyshah !

Did only single quotes work for you, or did double quotes work also, like this?

        runtime_config:
          properties:
            "spark.executor.instances": "4"
            "spark.driver.memory": "4g"

Does this work for you?

        runtime_config:
          properties:
            spark.executor.instances: '4'
            spark.driver.memory: '4g'

Or did the spark.executor.instances and spark.driver.memory keys need to be quoted as well?

Ultimately, I'm curious if maybe only integers need to be quoted, but everything else doesn't need to be quoted?

@jimmyshah
Copy link
Author

Hi @dbeatty10!

  • Example 1: Double quotes also works
  • Example 2: Only quoting the values also works

I also tried having only the '4' in quotes, and that also worked. This makes me think your theory is correct, that only integers need to be quoted.

@dbeatty10
Copy link
Contributor

Thanks for checking all those @jimmyshah !

I'm going to relabel this as a feature since technically dbt-biquery is working properly but isn't user-friendly. The docs are erroneous though.

To address both short-term and long-term, we should do the following:

  1. Update the documentation to show quotes around integers within properties of runtime_config
  2. Cast to a string within dbt-bigquery regardless if the user quotes or not

1. Update the documentation

I opened a PR to update the documentation:

2. Cast to a string within dbt-bigquery

Replace the passthrough serialization here with a custom serialization [1, 2] that serializes properties values to str.

Alternatively, casting values of properties to string can be done here.

@dbeatty10 dbeatty10 added enhancement New feature or request and removed bug Something isn't working triage labels Nov 7, 2023
@dbeatty10 dbeatty10 changed the title [ADAP-1007] [Bug] Dataproc batch runtime config properties require quotes [ADAP-1007] [Feature] Cast dataproc batch runtime_config properties to strings Nov 7, 2023
@jimmyshah
Copy link
Author

Thank you for the quick and thorough response @dbeatty10!

It looks like your docs PR has been merged, so the short-term fix should be in!

That said, I'm happy to raise a PR to address the underlying issue. Thank you for providing some guidance on how to best implement the change. I am leaning towards the second solution you proposed of casting the properties to strings.

@dbeatty10
Copy link
Contributor

Awesome @jimmyshah would be great if you want to raise a PR for this.

As a heads-up, the test case(s) always seem to be the most time-consuming part in the code review process.

Other things to be aware of:

  • Add resolves #1001 to the top of the PR description to link it to this issue
  • Install changie and run changie new as described here in order to create a changelog entry

@dbeatty10 dbeatty10 added the good_first_issue Good for newcomers label Nov 7, 2023
@jimmyshah
Copy link
Author

jimmyshah commented Nov 7, 2023

Hi @dbeatty10! Thank you for the heads up, definitely helpful!

I opened PR #1008 that (I think?) resolves the issue! For changie, I created a feature entry since we changed this issue from a bug to a feature.

My formatter/editor settings may have caused more changes than I intended, my apologies. I only meant to add these lines

If there is anything else I can do from my end, let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good_first_issue Good for newcomers
Projects
None yet
2 participants