Add Precision argument to `equality` for numeric and float type columns #757

rlh1994 · 2023-01-12T11:35:21Z

Describe the feature

It would be helpful to have an argument to the equality test that would specify the precision required for equality for numeric and float type columns by first rounding them before comparison. Currently equality can fail on seemingly identical tables due to precision errors as part of processing.

Describe alternatives you've considered

Removing these columns from the comparison, or implementing an overwritten macro to do this myself.

Additional context

Who will this benefit?

Anyone using the equality test with numeric/float type columns.

Are you interested in contributing this feature?

I have a working solution that I use locally based off the existing equality code, it works fine but is maybe not the cleanest way to do it.

{% test equality(model, compare_model, compare_columns=None, precision = None) %}
  {{ return(adapter.dispatch('test_equality', 'dbt_utils')(model, compare_model, compare_columns, precision)) }}
{% endtest %}

{% macro default__test_equality(model, compare_model, compare_columns=None, precision = None) %}

{% set set_diff %}
    count(*) + coalesce(abs(
        sum(case when which_diff = 'a_minus_b' then 1 else 0 end) -
        sum(case when which_diff = 'b_minus_a' then 1 else 0 end)
    ), 0)
{% endset %}

{#-- Needs to be set at parse time, before we return '' below --#}
{{ config(fail_calc = set_diff) }}

{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
{%- if not execute -%}
    {{ return('') }}
{% endif %}

-- setup
{%- do dbt_utils._is_relation(model, 'test_equality') -%}

{%- if not precision -%}
    {#-
        If the compare_cols arg is provided, we can run this test without querying the
        information schema — this allows the model to be an ephemeral model
    -#}
    {%- if not compare_columns -%}
        {%- do dbt_utils._is_ephemeral(model, 'test_equality') -%}
        {%- set compare_columns = adapter.get_columns_in_relation(model) | map(attribute='quoted') -%}
    {%- endif -%}

    {% set compare_cols_csv = compare_columns | join(', ') %}
{% else %}
    {#-
        If rounding is required, we need to get the types, so it can't be ephermeral
    -#}
    {%- do dbt_utils._is_ephemeral(model, 'test_equality') -%}
    {%- set columns = adapter.get_columns_in_relation(model) -%}

    {% set columns_list = [] %}
    {%- for col in columns -%}
        {%- if (compare_columns and col.name|lower in compare_columns|map('lower')) or not compare_columns -%}
            {%- if col.is_float() or col.is_numeric() -%}
                {# Cast is required due to postgres not having round for a double precision number #}
                {%- do columns_list.append('round(cast(' ~ col.name ~ ' as ' ~ dbt.type_numeric() ~ '),' ~ precision ~ ') as ' ~ col.name) -%}
            {%- else -%}
                {%- do columns_list.append(col.name) -%}
            {%- endif -%}
        {% endif %}
    {%- endfor -%}

    {% set compare_cols_csv = columns_list | join(', ') %}
{% endif %}

with a as (

    select * from {{ model }}

),

b as (

    select * from {{ compare_model }}

),

a_minus_b as (

    select {{compare_cols_csv}} from a
    {{ dbt.except() }}
    select {{compare_cols_csv}} from b

),

b_minus_a as (

    select {{compare_cols_csv}} from b
    {{ dbt.except() }}
    select {{compare_cols_csv}} from a

),

unioned as (

    select 'a_minus_b' as which_diff, a_minus_b.* from a_minus_b
    union all
    select 'b_minus_a' as which_diff, b_minus_a.* from b_minus_a

)

select * from unioned

{% endmacro %}

The text was updated successfully, but these errors were encountered:

joellabes · 2023-01-31T05:18:55Z

@rlh1994 conceptually I am into this, as long as it works in a backwards-compatible way. Offhand, I'm not sure how many tests this already has, but part of this PR would likely include fleshing that out to add a bunch of cases for the newly supported options (probably including some negative tests, i.e. making sure that it correctly fails).

I haven't reviewed this in depth, but gave it a skim and it looks like it's significantly less hassle than I would have assumed! I thought that the cross-database support was going to be a nightmare 💀

Feel free to spin up a PR and let's get some testing attached to it!

github-actions · 2023-07-31T01:45:43Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

rlh1994 · 2023-07-31T06:35:50Z

You shall not close!

github-actions · 2024-01-29T01:44:00Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

rlh1994 · 2024-01-29T07:30:13Z

rlh1994 added enhancement New feature or request triage labels Jan 12, 2023

rlh1994 mentioned this issue Jan 12, 2023

Fix/fix incorrect aliasing snowplow/dbt-snowplow-mobile#59

Merged

2 tasks

rlh1994 mentioned this issue Feb 6, 2023

add precision + exclude_columns option to equality test #765

Merged

18 tasks

github-actions bot added the Stale label Jul 31, 2023

github-actions bot removed the Stale label Aug 1, 2023

github-actions bot added the Stale label Jan 29, 2024

github-actions bot removed the Stale label Jan 30, 2024

gwenwindflower closed this as completed in #765 Mar 5, 2024

dbeatty10 removed the triage label Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Precision argument to `equality` for numeric and float type columns #757

Add Precision argument to `equality` for numeric and float type columns #757

rlh1994 commented Jan 12, 2023 •

edited

Loading

joellabes commented Jan 31, 2023

github-actions bot commented Jul 31, 2023

rlh1994 commented Jul 31, 2023

github-actions bot commented Jan 29, 2024

rlh1994 commented Jan 29, 2024

Add Precision argument to equality for numeric and float type columns #757

Add Precision argument to equality for numeric and float type columns #757

Comments

rlh1994 commented Jan 12, 2023 • edited Loading

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

joellabes commented Jan 31, 2023

github-actions bot commented Jul 31, 2023

rlh1994 commented Jul 31, 2023

github-actions bot commented Jan 29, 2024

rlh1994 commented Jan 29, 2024

Add Precision argument to `equality` for numeric and float type columns #757

Add Precision argument to `equality` for numeric and float type columns #757

rlh1994 commented Jan 12, 2023 •

edited

Loading