Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

equal_rowcount, fewer_rows_than macros don't work on Trino due to lateral column aliasing #744

Closed
damian3031 opened this issue Dec 15, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@damian3031
Copy link

Describe the bug

Current implementations of fewer_rows_than and equal_rowcount are not working on warehouses which do not support lateral column aliasing.

Steps to reproduce

Use test fewer_rows_than or equal_rowcount on warehouse which do not support lateral column aliasing.

Expected results

Test is working.

Actual results

Receiving error Column 'id_dbtutils_test_equal_rowcount' cannot be resolved

Which database are you using dbt with?
Trino, but relevant to all warehouses which do not support lateral column aliasing

The output of dbt --version:
1.3.0

Additional context

In equal_rowcount macro, at line 35 column alias is created, which is later used in group by clause within same query. It won't work on warehouses which do not support lateral column aliasing.
We could change this group by and instead of specifying column names, we could pass column indexes, sth like group by 1, 2, 3. It will work then on warehouses which do not support lateral column aliasing. In addition, this style is recommended on dbt blog: https://www.getdbt.com/blog/write-better-sql-a-defense-of-group-by-1/

Are you interested in contributing the fix?

Yes, I have already prepared solution working on warehouses which do not support lateral column aliasing.

@joellabes
Copy link
Contributor

@damian3031 thanks for opening this!

Amusingly (?), in some cases T-SQL doesn't support group by n syntax which is the reason that these tests behave the way they do. Changing back to group by 1, 2, 3 will solve your problem, but go back to breaking tsql. You can read more about this here for example: #310

Officially, dbt utils only commits to supporting the dbt Labs-maintained adapters, but we have historically been open to making changes that support other adapters as long as they don't impact the main mission.

Now that we have two community-powered adapters whose needs are in conflict, I think that some combination of @dbeatty10, @dataders, maybe @Fleid and I will need to put our heads together and think about what a broader multi-adapter strategy on dbt utils looks like.

I doubt that we'll move this forward before the holiday break, but it's on my list for the new year!

@joellabes joellabes changed the title equal_rowcount, fewer_rows_than macros bug equal_rowcount, fewer_rows_than macros don't work on Trino Dec 16, 2022
@joellabes joellabes changed the title equal_rowcount, fewer_rows_than macros don't work on Trino equal_rowcount, fewer_rows_than macros don't work on Trino due to lateral column aliasing Dec 16, 2022
@dataders
Copy link
Contributor

dataders commented Dec 16, 2022

@damian3031 really appreciate putting the whole story together here.

More context is that @damian3031 is the maintainer of dbt-utils-trino in which equal_rowcount & fewer_rows_than have been completely overriden just because of the lateral aliasiing issue. However, I do agree that the default groupby clause implementation could be improved. I like the intent behind #746 if wrapping the groupby-clause generation into it's own dispatchable macro, and that dbt_utils.group_by is the natural macro to use. However, like @joellabes said, you can't group by column order references in TSQL (bc TIL lateral aliasing isn't possible in TSQL due to its All-at-Once operation model).

These test macros were working for TSQL and tsql-utils when I made them but:

  1. We didn't keep up with maintenance and updating versions for tsql-utils
  2. Implement group_by_columns argument for relevant tests #633 (re-implemented later by joel for some reason in 78afedd broke compatibility with tsql-utils, and,
  3. because this repo doesn't test for compatibility with it's corresponding shim packages, this breaking change went unnoticed.

Proposal:

  • adopt the approach of Fixed fewer_rows_than and equal_rowcount macros #746, but first perhaps extend dbt_utils.group_by to allow for group by clauses with actual column names/references.
  • discuss with core adapters eng team as this issue has also recently come up with dbt-core and adapters (around support for python 3.11 as part of dbt-core's 1.4 release) due to a lack of test coverage
  • determine best practices for maintainers of adapter-specific utils packages for keeping things up to date

👀 @sdebruyn @b-per @prdpsvs @nathaniel-may @Fleid

@joellabes
Copy link
Contributor

first perhaps extend dbt_utils.group_by to allow for group by clauses with actual column names/references.

I don't think I'm into this - feels like a massive overloading of the macro. To maintain backwards compatibility it would have to be something like:

{% macro group_by(groupings) %}
  {% if groupings is list %}
    group by {{ groupings | join(", ")
  {% else %}
    group by {{ range(1, groupings) | join(", ") }} /* I don't think this is the correct syntax, doesn't really matter*/
  {% endif %}
{% endmacro %}

Which would mean that all existing places where the macro is used still wouldn't work. And in any case there are also a lot of places where needing to know the specific column names is undesirable (e.g. I have done {{ dbt_utils.group_by(4 if something else 5) }})

@RobbertDM
Copy link

We also ran into this issue today. Maybe we can solve this by adding a shim in https://github.com/starburstdata/dbt-trino ?

@dbeatty10
Copy link
Contributor

@RobbertDM Are you already installing and using the trino_utils package?

starburstdata/dbt-trino-utils@7ac7c8b is where equal_rowcount & fewer_rows_than were completely overridden because of the lateral aliasing issue.

@RobbertDM
Copy link

RobbertDM commented Apr 6, 2023 via email

@github-actions
Copy link

github-actions bot commented Oct 4, 2023

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@dataders
Copy link
Contributor

dataders commented Oct 9, 2023

@damian3031 I opened #843 and #844 to more clearly represent the ask here. I'm going to close this particular instance because it was addressed already in starburstdata/dbt-trino-utils#11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants