Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Precision argument to equality for numeric and float type columns #757

Closed
rlh1994 opened this issue Jan 12, 2023 · 5 comments · Fixed by #765
Closed

Add Precision argument to equality for numeric and float type columns #757

rlh1994 opened this issue Jan 12, 2023 · 5 comments · Fixed by #765
Labels
enhancement New feature or request

Comments

@rlh1994
Copy link
Contributor

rlh1994 commented Jan 12, 2023

Describe the feature

It would be helpful to have an argument to the equality test that would specify the precision required for equality for numeric and float type columns by first rounding them before comparison. Currently equality can fail on seemingly identical tables due to precision errors as part of processing.

Describe alternatives you've considered

Removing these columns from the comparison, or implementing an overwritten macro to do this myself.

Additional context

Who will this benefit?

Anyone using the equality test with numeric/float type columns.

Are you interested in contributing this feature?

I have a working solution that I use locally based off the existing equality code, it works fine but is maybe not the cleanest way to do it.

{% test equality(model, compare_model, compare_columns=None, precision = None) %}
  {{ return(adapter.dispatch('test_equality', 'dbt_utils')(model, compare_model, compare_columns, precision)) }}
{% endtest %}

{% macro default__test_equality(model, compare_model, compare_columns=None, precision = None) %}

{% set set_diff %}
    count(*) + coalesce(abs(
        sum(case when which_diff = 'a_minus_b' then 1 else 0 end) -
        sum(case when which_diff = 'b_minus_a' then 1 else 0 end)
    ), 0)
{% endset %}

{#-- Needs to be set at parse time, before we return '' below --#}
{{ config(fail_calc = set_diff) }}

{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
{%- if not execute -%}
    {{ return('') }}
{% endif %}

-- setup
{%- do dbt_utils._is_relation(model, 'test_equality') -%}

{%- if not precision -%}
    {#-
        If the compare_cols arg is provided, we can run this test without querying the
        information schema — this allows the model to be an ephemeral model
    -#}
    {%- if not compare_columns -%}
        {%- do dbt_utils._is_ephemeral(model, 'test_equality') -%}
        {%- set compare_columns = adapter.get_columns_in_relation(model) | map(attribute='quoted') -%}
    {%- endif -%}

    {% set compare_cols_csv = compare_columns | join(', ') %}
{% else %}
    {#-
        If rounding is required, we need to get the types, so it can't be ephermeral
    -#}
    {%- do dbt_utils._is_ephemeral(model, 'test_equality') -%}
    {%- set columns = adapter.get_columns_in_relation(model) -%}

    {% set columns_list = [] %}
    {%- for col in columns -%}
        {%- if (compare_columns and col.name|lower in compare_columns|map('lower')) or not compare_columns -%}
            {%- if col.is_float() or col.is_numeric() -%}
                {# Cast is required due to postgres not having round for a double precision number #}
                {%- do columns_list.append('round(cast(' ~ col.name ~ ' as ' ~ dbt.type_numeric() ~ '),' ~ precision ~ ') as ' ~ col.name) -%}
            {%- else -%}
                {%- do columns_list.append(col.name) -%}
            {%- endif -%}
        {% endif %}
    {%- endfor -%}

    {% set compare_cols_csv = columns_list | join(', ') %}
{% endif %}

with a as (

    select * from {{ model }}

),

b as (

    select * from {{ compare_model }}

),

a_minus_b as (

    select {{compare_cols_csv}} from a
    {{ dbt.except() }}
    select {{compare_cols_csv}} from b

),

b_minus_a as (

    select {{compare_cols_csv}} from b
    {{ dbt.except() }}
    select {{compare_cols_csv}} from a

),

unioned as (

    select 'a_minus_b' as which_diff, a_minus_b.* from a_minus_b
    union all
    select 'b_minus_a' as which_diff, b_minus_a.* from b_minus_a

)

select * from unioned

{% endmacro %}
@joellabes
Copy link
Contributor

@rlh1994 conceptually I am into this, as long as it works in a backwards-compatible way. Offhand, I'm not sure how many tests this already has, but part of this PR would likely include fleshing that out to add a bunch of cases for the newly supported options (probably including some negative tests, i.e. making sure that it correctly fails).

I haven't reviewed this in depth, but gave it a skim and it looks like it's significantly less hassle than I would have assumed! I thought that the cross-database support was going to be a nightmare 💀

Feel free to spin up a PR and let's get some testing attached to it!

@github-actions
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 31, 2023
@rlh1994
Copy link
Contributor Author

rlh1994 commented Jul 31, 2023

You shall not close!

@github-actions github-actions bot removed the Stale label Aug 1, 2023
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 29, 2024
@rlh1994
Copy link
Contributor Author

rlh1994 commented Jan 29, 2024

Uno-Reverse-Card-Blue-Classic-Uno-Unorules org_ png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants