Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADAP-464: Add allow_non_incremental_definition option to BigQuery materialized views #1011

Closed
wants to merge 10 commits into from

Conversation

bnaul
Copy link

@bnaul bnaul commented Nov 8, 2023

Closes #672.

Problem

Non-incremental materialized views are much more flexible than incremental and have a lot of uses: see e.g. https://medium.com/google-cloud/hidden-gems-of-bigquery-part-4-five-types-of-views-ddfbc05118fb#bc99

Solution

Add allow_non_incremental_definition flag following the pattern for other fields. The only wrinkle is that the Python client library doesn't yet have a property for this variable, so it's being pulled directly from the properties dict instead.

I'm assuming the existing tests for materialized views will exercise this property as well (with the default False), if it seems necessary to add a special test with the flag set to True let me know.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@bnaul bnaul requested a review from a team as a code owner November 8, 2023 15:53
Copy link

cla-bot bot commented Nov 8, 2023

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @bnaul

Copy link
Contributor

@mikealfare mikealfare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I left a few comments that I suspect are straight forward updates.

This would also need a changelog (run changie new) and a functional test. You can likely update some existing tests to just include this new setting. Or you could write a new one if that's easier.

dbt/adapters/bigquery/relation_configs/_options.py Outdated Show resolved Hide resolved
@mikealfare mikealfare self-assigned this Nov 8, 2023
@cla-bot cla-bot bot added the cla:yes label Nov 9, 2023
@bnaul
Copy link
Author

bnaul commented Nov 9, 2023

One more question @mikealfare, as the issue mentions this flag can't be set via ALTER, only CREATE. Could you point me to where these distinctions are encoded?

Lifecycle : default except Refresh is bypassed if auto_refresh
on_configuration_change.apply : enable_refresh/refresh_interval_minutes/max_staleness via ALTER, rest via DROP/CREATE

@mikealfare
Copy link
Contributor

One more question @mikealfare, as the issue mentions this flag can't be set via ALTER, only CREATE. Could you point me to where these distinctions are encoded?

Lifecycle : default except Refresh is bypassed if auto_refresh
on_configuration_change.apply : enable_refresh/refresh_interval_minutes/max_staleness via ALTER, rest via DROP/CREATE

Good catch, I missed this. This complicates things. This basically means if the value changes, you'll need a full refresh to re-create the materialized view. There's a change object that has a requires_full_refresh calculated property here:

class BigQueryOptionsConfigChange(RelationConfigChange):
context: BigQueryOptionsConfig
@property
def requires_full_refresh(self) -> bool:
return False
. Prior to this PR, all options changes were able to be changed on existing objects, hence the hard-coded False. Now we need to see if that changed. The problem is that this object only has the new options on it. These get set here:
if new_materialized_view.options != existing_materialized_view.options:
config_change_collection.options = BigQueryOptionsConfigChange(
action=RelationConfigChangeAction.alter,
context=new_materialized_view.options,
)
. The approach I would take is to set the action to be alter if this new option does not change, or set it to be create if it does (in the relation file). Then condition on the action in the calculated property in requires_full_refresh to determine if a full refresh is necessary. Beyond that, I'm not sure what will happen if you submit an alter statement with this setting, even if it's the same. So you might need to pop off this option before passing it to the alter template (but not before passing it to the create template).

@mikealfare
Copy link
Contributor

Also, given that this option acts differently and should generate a full refresh, we should add a test for that. You could see how I did that here: https://github.com/dbt-labs/dbt-bigquery/blob/main/tests/functional/adapter/materialized_view_tests/test_materialized_view_cluster_changes.py. Basically swap out the clustering-related pieces for the new setting.

@mikealfare mikealfare changed the title Add allow_non_incremental_definition option to BigQuery materialized views ADAP-464: Add allow_non_incremental_definition option to BigQuery materialized views Nov 9, 2023
@bnaul
Copy link
Author

bnaul commented Nov 12, 2023

Seems doable, one other thing I'm hitting is that max_staleness is currently ignored (always parsed as None), it's present on the Python table object but as a JSON string representation that can convert to a relativedelta. The standard way a user would define it though would just be like INTERVAL 60 MINUTES, not 0-0 60:0:0. The interval literal rules seem a bit complicated https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#interval_type, is there anywhere that handles this kind of thing already or would comparing user- and BQ API-provided intervals be a new addition here?

@mikealfare
Copy link
Contributor

Seems doable, one other thing I'm hitting is that max_staleness is currently ignored (always parsed as None), it's present on the Python table object but as a JSON string representation that can convert to a relativedelta. The standard way a user would define it though would just be like INTERVAL 60 MINUTES, not 0-0 60:0:0. The interval literal rules seem a bit complicated https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#interval_type, is there anywhere that handles this kind of thing already or would comparing user- and BQ API-provided intervals be a new addition here?

No, there's nowhere that handles this, and I believe the issue you identified (the complexity) is why we moved forward while excluding it from the comparison. I haven't finished the docs for 1.7 yet, but that's certainly a caveat. The workaround is to use --full-refresh.

@jess-gf
Copy link

jess-gf commented Feb 13, 2024

Hi guys,
Thank you so much for this pull request.
Do you have any updates if this change is going to be merged soon? =)
Thanks

@mikealfare mikealfare added the community This PR is from a community member label Jun 7, 2024
Copy link
Contributor

@mikealfare mikealfare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @bnaul! This looks mostly complete. I had a few nits and few small changes that I'd like to make before merging this. Let me know if you have any concerns with my responses.

@mikealfare
Copy link
Contributor

This PR will be continued in #1255. I'm closing this to avoid confusion, however the work will continue. Please refer to #1255 moving forward.

@mikealfare mikealfare closed this Jun 11, 2024
mikealfare added a commit that referenced this pull request Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes community This PR is from a community member ok to test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ADAP-464] Materialized Views - Non-incremental support
4 participants