Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

equal_rowcount behavior for empty table in BigQuery #766

Closed
1 of 5 tasks
bsmith925 opened this issue Feb 6, 2023 · 7 comments
Closed
1 of 5 tasks

equal_rowcount behavior for empty table in BigQuery #766

bsmith925 opened this issue Feb 6, 2023 · 7 comments
Labels
bug Something isn't working good first issue Stale

Comments

@bsmith925
Copy link

bsmith925 commented Feb 6, 2023

Describe the bug

When querying an empty table in BigQuery, a message is returned saying "Results not found" rather than a count of 0. This results in the test erroring on type validation in the return schema.

Steps to reproduce

Create two table in BigQuery without adding any data to at least one of them.
Use the following in your YAML file:
`models:

  • name: table_one
    tests:
    • tests:
      • dbt_utils.equal_rowcount:
        compare_model: ref('model2')`

Expected results

The test should not fail if both tables are empty. The test should fail if if one table is populated and the other is not, but return the
difference in rowcount.

Screenshots and log output

None is not of type 'integer'
14:52:29
14:52:29 Failed validating 'type' in schema['properties']['failures']:
14:52:29 {'type': 'integer'}
14:52:29
14:52:29 On instance['failures']:
14:52:29 None

System information

The contents of your packages.yml file:
packages:

  • package: dbt-labs/dbt_utils
    version: 1.0.0

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

Core:
  - installed: 1.3.1
  - latest:    1.4.1 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - bigquery:  1.3.0 - Update available!

Are you interested in contributing the fix?

I can contribute, but am busy for the next couple of weeks.

@bsmith925 bsmith925 added bug Something isn't working triage labels Feb 6, 2023
@FrankTub
Copy link

I can confirm that the same behavior happens on Postgres. The problem is the following.

A query like below is generated.

with a as (

    select 
      
      1 as id_dbtutils_test_equal_rowcount,
      count(*) as count_a 
    from "staging"."datamarts"."fct_monthly_weighted_average_lives"
    group by 1


),
b as (

    select 
      
      1 as id_dbtutils_test_equal_rowcount,
      count(*) as count_b 
    from "staging"."transformations"."monthly_weighted_average_lives"
    group by id_dbtutils_test_equal_rowcount

),
final as (

    select
    
        a.id_dbtutils_test_equal_rowcount as id_dbtutils_test_equal_rowcount_a,
          b.id_dbtutils_test_equal_rowcount as id_dbtutils_test_equal_rowcount_b,
        

        count_a,
        count_b,
        abs(count_a - count_b) as diff_count

    from a
    full join b
    on
    a.id_dbtutils_test_equal_rowcount = b.id_dbtutils_test_equal_rowcount
    


)

select * from final

But the group by 1 results in no rows returned in combination with a count(*) statement. So the solution would be to modify the query that is generated to something like below.

with a as (
    select 
      count(*) as count_a 
    from "staging"."datamarts"."fct_monthly_weighted_average_lives"
),
b as (
    select 
      count(*) as count_b 
    from "staging"."transformations"."monthly_weighted_average_lives"
),
final as (

    select
        count_a,
        count_b,
        abs(count_a - count_b) as diff_count
    from a
    cross join b

)
select * from final

@joellabes
Copy link
Contributor

Hey @bsmith925 and @FrankTub, thanks for reporting this! I'd welcome a PR to fix it up, along with a new test which validates that the test behaves correctly against an empty table

@danhphan
Copy link

Hi @joellabes I am happy to work on this task. Thank you.

@tatumbrannan
Copy link

Hello, I am currently experiencing the same error. Has there been any advancements on this bug?

@vinit2107
Copy link

@joellabes - I have tried to resolve this issue and raised PR #830.

The solution discussed above using cross join will only work if we do not provide the group by columns and the macro has the capability to add group by columns, and doing a cross join on the resultant CTE will cause issues.
I tried to resolve the problem by removing the group by clause in the resultant query of the macro. The group by clause will only be added if the user specifies a group by column. This fix does avoid the issue that @FrankTub is facing, but not completely because the macro will still fail if the user provides a group by column on two empty tables. It'll fail with the same error as described above. We can wrap it around an additional CTE but that defeats the purpose/capability to add "group by" columns.

Please let me know if it's an acceptable solution or if I need to change anything else. This is my first contribution.

Thanks!

Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Feb 29, 2024
Copy link

github-actions bot commented Mar 8, 2024

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants