Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1641] [CT-1622] [Bug] Insert overwrite incremental models using create or replace table #424

Open
2 tasks done
roycefp opened this issue Dec 7, 2022 · 17 comments
Open
2 tasks done
Labels
bug Something isn't working

Comments

@roycefp
Copy link

roycefp commented Dec 7, 2022

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When I run an incremental table update using the insert_overwrite strategy and require_partition_filter=true and using static partitions, the table runs a create or replace table statement with a query syntax equivalent to (select * from table).

This returns an error: Cannot query over table XX without a filter over column(s) XX that can be used for partition elimination

I observed this for several models but others work fine. There is no difference in how these models are built.

Expected Behavior

Steps To Reproduce

I'll exclude the main body of the model for brevity but the config params are:

{{
  config(
      materialized='incremental',
      incremental_strategy = "insert_overwrite",
      cluster_by="country",
      partition_by = {
          "field": "report_date",
          "data_type": "date",
          "granularity": "day"
      },
      partitions = [DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY), DATE_SUB(CURRENT_DATE, INTERVAL 2 DAY)],
      require_partition_filter=true,
      on_schema_change="sync_all_columns"
  )
}}

after running the model, i looked at the logs

  1. the 1st query created the tmp table correctly although its not used
  2. the 2nd query ran a create or replace table instead of a merge

/* {"app": "dbt", "dbt_version": "1.1.2", "profile_name": "bigquery", "target_name": "prod", "node_id": "model_name"} */

  create or replace table `model_name`
  partition by report_date
  cluster by country
  OPTIONS(
      description="""table description""",
    
      labels=[('label')],
    
      require_partition_filter=True
    )
  as (
    
    select
          (all columns)
    from `model_name`
  );

error is return as the create or replace statement is not using the partition filter, but it is also not correct in the first place as it should be a merge statement

Relevant log output

No response

Environment

- OS: Big Sur 11.6
- Python: 3.8.0
- dbt:1.1.2

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

@roycefp roycefp added bug Something isn't working triage labels Dec 7, 2022
@github-actions github-actions bot changed the title [Bug] Insert overwrite incremental models using create or replace table [CT-1622] [Bug] Insert overwrite incremental models using create or replace table Dec 7, 2022
@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core Dec 10, 2022
@github-actions github-actions bot changed the title [CT-1622] [Bug] Insert overwrite incremental models using create or replace table [CT-1641] [CT-1622] [Bug] Insert overwrite incremental models using create or replace table Dec 10, 2022
@dbeatty10
Copy link
Contributor

Thanks for reporting this @roycefp !

I took a first shot at reproducing the Cannot query over table ... error you saw, but I wasn't able to produce an error.

Can you try the code I have below and see if it generates an error on your side?

If it works without error, are you able to tweak it so that it does trigger the error?

models/incremental_424.sql

{{
    config(
        materialized="incremental",
        incremental_strategy="insert_overwrite",
        cluster_by="country",
        partition_by={
            "field": "report_date",
            "data_type": "date",
            "granularity": "day"
        },
        partitions=["DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)", "DATE_SUB(CURRENT_DATE, INTERVAL 2 DAY)"],
        require_partition_filter=true,
        on_schema_change="sync_all_columns"
    )
}}


with data as (

    {% if not is_incremental() %}

        select 1 as country, cast('2022-12-13' as date) as report_date union all
        select 2 as country, cast('2022-12-13' as date) as report_date union all
        select 3 as country, cast('2022-12-13' as date) as report_date union all
        select 4 as country, cast('2022-12-13' as date) as report_date

    {% else %}

        -- we want to overwrite the 4 records in the 2022-12-13 partition
        -- with the 2 records below, but add two more in the 2022-12-14 partition
        select 10 as country, cast('2022-12-13' as date) as report_date union all
        select 20 as country, cast('2022-12-13' as date) as report_date union all
        select 30 as country, cast('2022-12-14' as date) as report_date union all
        select 40 as country, cast('2022-12-14' as date) as report_date

    {% endif %}

)

select * from data

Run the following in order to do two iterations of the incremental model:

dbt run -s incremental_424 --full-refresh
dbt run -s incremental_424

I used dbt-core=1.1.2 and dbt-bigquery=1.1.1.

@Fleid
Copy link
Contributor

Fleid commented Feb 17, 2023

Hi @roycefp, since we can't reproduce, I will close this now.
Please comment below if needed and we will re-open.

@Fleid Fleid closed this as completed Feb 17, 2023
@brebertus21
Copy link

brebertus21 commented Mar 16, 2023

Hi @dbeatty10, looks like we're running into the same issue. With some further debugging, it appears that is_incremental() is returning false when we'd expect it to return true. Specifically, relation is none in the logic here. Similar to the example above this is only affecting one model out of several incremental models.

Wondering if you have any tips on how relation is being determined here so we can dig deeper?

Currently running:
dbt-core==1.2.0 and dbt-bigquery==1.2.0

@Fleid
Copy link
Contributor

Fleid commented Mar 20, 2023

Re-opening!
@dbeatty10 do you still have context, or do you want me to take over?

@dbeatty10
Copy link
Contributor

@Fleid you can take over!

Here's the only other context that might be helpful:
It looks like is_incremental() uses adapter.get_relation(). dbt-labs/dbt-core#7024 is another recent issue involving an is_{something} macro and adapter.get_relation. @mikealfare was recently troubleshooting that one so might be worth asking him if there's similarities here.

@Fleid
Copy link
Contributor

Fleid commented Mar 20, 2023

So I will ask Mike to take a look at it, but he's super busy right now.
@brebertus21 you can follow up on the conversation in the issue Doug linked to. It's a complex one that we're having a tough time getting to the bottom of.
What it means is that it will take time for us to resolve this. In the meantime, do you have a workaround?

@Fleid Fleid added refinement Product or leadership input needed and removed triage labels Mar 20, 2023
@brebertus21
Copy link

Hey @Fleid, thanks for following up. Did some digging on this the past few days. I was able to reduce the failing table down to something like

SELECT
  DATE("2023-03-10") as dt,
  "id" as user_id

and was still seeing issues. Eventually ended up changing the name of the ref itself from user_*.sql to bug_user.sql and that amazingly that fixed the issue for this ref only. Eventually got around to trying to materialize this in a different schema and everything ran successfully (including some downstream refs that were failing with the same issue, once the name-workaround was in place).

This is in a development schema that my team has been using for almost a year that at this point is cluttered with old products. My hypothesis is that when dbt queries the schema metadata at query-render time, this operation is timing out (or similar) on the large schema, but that error is being caught and none is returned.

@brebertus21
Copy link

It was suspicious that the ref name made any difference, until I got to thinking that maybe there is some inherent ordering in the metadata dbt is scanning

@mikealfare mikealfare self-assigned this Mar 21, 2023
@mikealfare
Copy link
Contributor

I'm assigning this to me for now to keep it on my radar, though there are things ahead of it as @Fleid points out. From a very cursory scan, it does appear related to the other adapter.get_relation() issue that we saw.

@Fleid Fleid removed the refinement Product or leadership input needed label Mar 24, 2023
@roycefp
Copy link
Author

roycefp commented Sep 14, 2023

hello @mikealfare / @Fleid! following-up on this to check if you have any updates, as we recently ran into the same issue again :)

our interim solution has been to set require_partition_filter=false just so that we're not forced into an error, but that's obviously a workaround instead of solving the issue at hand, as we're not familiar with the root cause.

should we be watching for this feature as a fix? dbt-labs/dbt-utils#833

cc: @shaznishoffie @chinwee-foodpanda

@Fleid
Copy link
Contributor

Fleid commented Oct 25, 2023

Yes @roycefp - the dbt-utils issue is the one to follow.
Sorry about the delay ><

@dbeatty10
Copy link
Contributor

@roycefp We don't actually know for certain if dbt-labs/dbt-utils#833 is related to #424 or not!

Our current barrier here is that we don't yet have a reproducible test case ("reprex"). Do you have any insights how we might be able to reproduce the situation you are seeing?

We're in a holding pattern until we can reproduce this one 😢

Labeling this as "awaiting_response" as an indicator that we're seeking help to reproduce this.

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 31, 2024
Copy link
Contributor

github-actions bot commented Feb 8, 2024

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 8, 2024
@Haebuk
Copy link

Haebuk commented Apr 17, 2024

hey I can reproduce error @dbeatty10 's example:

{{
    config(
        materialized="incremental",
        incremental_strategy="insert_overwrite",
        cluster_by="country",
        partition_by={
            "field": "report_date",
            "data_type": "datetime",
            "granularity": "day"
        },
        partitions=["DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)", "DATE_SUB(CURRENT_DATE, INTERVAL 2 DAY)"],
        require_partition_filter=true,
        on_schema_change="sync_all_columns"
    )
}}


with data as (

    {% if not is_incremental() %}

        select 1 as country, cast('2022-12-13 11:00:00' as datetime) as report_date union all
        select 2 as country, cast('2022-12-13 11:00:00' as datetime) as report_date union all
        select 3 as country, cast('2022-12-13 11:00:00' as datetime) as report_date union all
        select 4 as country, cast('2022-12-13 11:00:00' as datetime) as report_date

    {% else %}

        -- we want to overwrite the 4 records in the 2022-12-13 partition
        -- with the 2 records below, but add two more in the 2022-12-14 partition
        select 10 as country, cast('2022-12-13 11:00:00' as datetime) as report_date union all
        select 20 as country, cast('2022-12-13 11:00:00' as datetime) as report_date union all
        select 30 as country, cast('2022-12-14 11:00:00' as datetime) as report_date union all
        select 40 as country, cast('2022-12-14 11:00:00' as datetime) as report_date

    {% endif %}

)

select * from data

As you know, original example works, but convert date to datetime, it fails.

Query error: Cannot query over table '<table>' without a filter over column(s) 'report_date' that can be used for partition elimination at [10:5]

@dbeatty10
Copy link
Contributor

Thanks for sharing that reprex in #424 (comment) @Haebuk !

When I tried running this command, I saw the same error message that both you and the original poster mentioned:

dbt run -s incremental_424

So I'm going to re-open this issue.

I'm still not sure if this is unexpected behavior or not, so I'm going to leave the triage label on for us to take a closer look.

Were you able to find any way to work around this?

@dbeatty10 dbeatty10 reopened this Apr 18, 2024
@dbeatty10 dbeatty10 removed the Stale label Apr 18, 2024
@Haebuk
Copy link

Haebuk commented May 27, 2024

@dbeatty10 sorry to late reply,
We couldn't find solution :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants