Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type string macro handling all cases #475

Merged

Conversation

mweso-softserve
Copy link
Contributor

Creates a macro that redefine type_string for models in dbt_project_evaluator package. No need to override it for all models in a project anymore when using package.
This attempt uses api.Column.string_type(600) for all databases except bigquery.

This is a:

  • bug fix PR with no breaking changes
  • new functionality

Link to Issue

Closes #469

Description & motivation

This implementation replaces type_string() macro which does't work for redshift for tables defined in the way it was implemented in dbt-project-evaluastor with api.Column.string_type(600)

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
    • Databricks
    • DuckDB
    • Trino/Starburst
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)

I tested that on a local version of this package without dispatcher configuration.

Michał Wesołowski added 2 commits July 3, 2024 10:26
…y models in dbt_project_evaluator package. No need to override it for all models in a project anymore when using package.
@b-per
Copy link
Collaborator

b-per commented Jul 25, 2024

I had a chat with the team and we don't really want to introduce a dbt_project_evaluator version of type_string().

We understand that with #469, the first time CI when the package is added it might pick up more models than needed, but this should happen only once and could be handled manually.

I am going on leave for a couple of weeks and will want to revisit the few different issues we have around strings with Redshift but this particular PR is likely not one we would want to merge to this repo.

@mweso-softserve
Copy link
Contributor Author

@b-per

I had a chat with the team and we don't really want to introduce a dbt_project_evaluator version of type_string().

We understand that with #469, the first time CI when the package is added it might pick up more models than needed, but this should happen only once and could be handled manually.

I am going on leave for a couple of weeks and will want to revisit the few different issues we have around strings with Redshift but this particular PR is likely not one we would want to merge to this repo.

dbt_project_evaluator already introduced its version of type_string() however the way it did changes the type definition for all redshift models. I'm not saying this PR is definitely the way to go, but the benefit is it redefines type_string() for dbt_project_evaluator's models only not the entire project. Let's imagine I want to use two different dbt packages that handle this kind of problem in two different ways. Both would try to redefine type_string() for the entire project and depending on the dispatch config one of them would always win potentially braking functionality of the other.

It's not only the issue of dealing with models being marked as changed the first time it runs, but redefining ultimately all existing models whenever the string type was used.
I don't agree that any package should modify definitions of types in exiting models simply by adding its configuration, unless that's the sole purpose of the package.
Please revisit the approach.

@glsdown
Copy link
Contributor

glsdown commented Jul 25, 2024

I have to agree with @mweso-softserve here. In your current approach you have overwritten a core macro type_string used across every model in every Redshift dbt project. This instead will namespace that usage so it only becomes applicable to project evaluator models.

@b-per
Copy link
Collaborator

b-per commented Aug 13, 2024

Thanks for the feedback. I was out of office for a bit but I will try to talk again with the team by next week.

@b-per
Copy link
Collaborator

b-per commented Aug 19, 2024

We had a chat internally today.

As both of you raise the same point, we are going forward with this approach, creating a dbt_project_evaluator.type_string() macro.

Hopefully it should help people loading strings in Redshift which we a few issues were raised about.

@b-per b-per merged commit 68486e5 into dbt-labs:main Aug 19, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

redshift__type_string overwrite is causing all existing models to be marked as modified
4 participants