-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1641] [CT-1622] [Bug] Insert overwrite incremental models using create or replace table #424
Comments
Thanks for reporting this @roycefp ! I took a first shot at reproducing the Can you try the code I have below and see if it generates an error on your side? If it works without error, are you able to tweak it so that it does trigger the error?
{{
config(
materialized="incremental",
incremental_strategy="insert_overwrite",
cluster_by="country",
partition_by={
"field": "report_date",
"data_type": "date",
"granularity": "day"
},
partitions=["DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)", "DATE_SUB(CURRENT_DATE, INTERVAL 2 DAY)"],
require_partition_filter=true,
on_schema_change="sync_all_columns"
)
}}
with data as (
{% if not is_incremental() %}
select 1 as country, cast('2022-12-13' as date) as report_date union all
select 2 as country, cast('2022-12-13' as date) as report_date union all
select 3 as country, cast('2022-12-13' as date) as report_date union all
select 4 as country, cast('2022-12-13' as date) as report_date
{% else %}
-- we want to overwrite the 4 records in the 2022-12-13 partition
-- with the 2 records below, but add two more in the 2022-12-14 partition
select 10 as country, cast('2022-12-13' as date) as report_date union all
select 20 as country, cast('2022-12-13' as date) as report_date union all
select 30 as country, cast('2022-12-14' as date) as report_date union all
select 40 as country, cast('2022-12-14' as date) as report_date
{% endif %}
)
select * from data Run the following in order to do two iterations of the incremental model: dbt run -s incremental_424 --full-refresh
dbt run -s incremental_424 I used |
Hi @roycefp, since we can't reproduce, I will close this now. |
Hi @dbeatty10, looks like we're running into the same issue. With some further debugging, it appears that Wondering if you have any tips on how Currently running: |
Re-opening! |
@Fleid you can take over! Here's the only other context that might be helpful: |
So I will ask Mike to take a look at it, but he's super busy right now. |
Hey @Fleid, thanks for following up. Did some digging on this the past few days. I was able to reduce the failing table down to something like
and was still seeing issues. Eventually ended up changing the name of the ref itself from This is in a development schema that my team has been using for almost a year that at this point is cluttered with old products. My hypothesis is that when dbt queries the schema metadata at query-render time, this operation is timing out (or similar) on the large schema, but that error is being caught and none is returned. |
It was suspicious that the ref name made any difference, until I got to thinking that maybe there is some inherent ordering in the metadata dbt is scanning |
I'm assigning this to me for now to keep it on my radar, though there are things ahead of it as @Fleid points out. From a very cursory scan, it does appear related to the other |
hello @mikealfare / @Fleid! following-up on this to check if you have any updates, as we recently ran into the same issue again :) our interim solution has been to set should we be watching for this feature as a fix? dbt-labs/dbt-utils#833 |
Yes @roycefp - the dbt-utils issue is the one to follow. |
@roycefp We don't actually know for certain if dbt-labs/dbt-utils#833 is related to #424 or not! Our current barrier here is that we don't yet have a reproducible test case ("reprex"). Do you have any insights how we might be able to reproduce the situation you are seeing? We're in a holding pattern until we can reproduce this one 😢 Labeling this as "awaiting_response" as an indicator that we're seeking help to reproduce this. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
hey I can reproduce error @dbeatty10 's example: {{
config(
materialized="incremental",
incremental_strategy="insert_overwrite",
cluster_by="country",
partition_by={
"field": "report_date",
"data_type": "datetime",
"granularity": "day"
},
partitions=["DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)", "DATE_SUB(CURRENT_DATE, INTERVAL 2 DAY)"],
require_partition_filter=true,
on_schema_change="sync_all_columns"
)
}}
with data as (
{% if not is_incremental() %}
select 1 as country, cast('2022-12-13 11:00:00' as datetime) as report_date union all
select 2 as country, cast('2022-12-13 11:00:00' as datetime) as report_date union all
select 3 as country, cast('2022-12-13 11:00:00' as datetime) as report_date union all
select 4 as country, cast('2022-12-13 11:00:00' as datetime) as report_date
{% else %}
-- we want to overwrite the 4 records in the 2022-12-13 partition
-- with the 2 records below, but add two more in the 2022-12-14 partition
select 10 as country, cast('2022-12-13 11:00:00' as datetime) as report_date union all
select 20 as country, cast('2022-12-13 11:00:00' as datetime) as report_date union all
select 30 as country, cast('2022-12-14 11:00:00' as datetime) as report_date union all
select 40 as country, cast('2022-12-14 11:00:00' as datetime) as report_date
{% endif %}
)
select * from data As you know, original example works, but convert date to datetime, it fails. Query error: Cannot query over table '<table>' without a filter over column(s) 'report_date' that can be used for partition elimination at [10:5] |
Thanks for sharing that reprex in #424 (comment) @Haebuk ! When I tried running this command, I saw the same error message that both you and the original poster mentioned: dbt run -s incremental_424 So I'm going to re-open this issue. I'm still not sure if this is unexpected behavior or not, so I'm going to leave the Were you able to find any way to work around this? |
@dbeatty10 sorry to late reply, |
Is this a new bug in dbt-core?
Current Behavior
When I run an
incremental
table update using theinsert_overwrite
strategy andrequire_partition_filter
=true and usingstatic partitions
, the table runs acreate or replace table
statement with a query syntax equivalent to (select * from table).This returns an error: Cannot query over table XX without a filter over column(s) XX that can be used for partition elimination
I observed this for several models but others work fine. There is no difference in how these models are built.
Expected Behavior
create or replace table
statement, i expect a merge statement to be ran (ie merge into table XX as DBT_INTERNAL_DEST using (query)) since this is an incremental table and the target table exists/is not emptySteps To Reproduce
I'll exclude the main body of the model for brevity but the config params are:
after running the model, i looked at the logs
error is return as the create or replace statement is not using the partition filter, but it is also not correct in the first place as it should be a merge statement
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
bigquery
Additional Context
No response
The text was updated successfully, but these errors were encountered: