Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix base_source_columns and base_node_columns in Redshift #467

Conversation

stevenconnorg
Copy link

@stevenconnorg stevenconnorg commented Jun 24, 2024

This is a:

  • bug fix PR with no breaking changes
  • new functionality

Link to Issue

Description & motivation

Integration Test Screenshot

Screenshot 2024-06-24 at 1 13 07 PM

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
    • Databricks
    • DuckDB
    • Trino/Starburst
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)

@stevenconnorg stevenconnorg changed the title use type_large_string() and consolidate schemas for base column models Fix base_source_columns and base_nodel_columns in Redshift Jun 24, 2024
@stevenconnorg stevenconnorg changed the title Fix base_source_columns and base_nodel_columns in Redshift Fix base_source_columns and base_node_columns in Redshift Jun 24, 2024
@b-per
Copy link
Collaborator

b-per commented Jun 25, 2024

Hi @stevenconnorg

Thanks for the work on this one!

At first, I am a bit worried of moving to varchar(50000) and changing all the type_string() in those models to type_large_string() for all users of the package.

From what I read here , I understand that Redshift does some optimization/memory allocation based on the types and I wouldn't want people who don't have any issue today with Redshift to be negatively impacted by performance following this change.

In your specific case, do you know what specific column(s) was/were creating the issue? And what varchar size this column has?

We might then be able to target the changes required to fix the reported issues.

@stevenconnorg
Copy link
Author

stevenconnorg commented Jun 25, 2024

Hey @b-per

For some reason, dbt.type_string() is creating character varying(1) columns, so this is causing the majority of the issues in these models (also reported in #463)

However, I'm still running into issues with redshift__type_large_string() in the description column, with the max string length around ~13k characters.

So maybe we bump the redshift__type_large_string() length to 15-20 thousand, and then figure out what's going on with the dbt.type_string()?

@b-per
Copy link
Collaborator

b-per commented Jun 26, 2024

Yes, I think it is better to get to the bottom of it.

Do you have jq installed? If not, could you install it and do (I am on Mac, but we could do something similar in Windows)

cat target/manifest.json | jq '.macros | .[] | select(.name | contains("type_string"))'

and copy the results here?

My results look like

{
  "name": "type_string",
  "resource_type": "macro",
  "package_name": "dbt",
  "path": "macros/utils/data_types.sql",
  "original_file_path": "macros/utils/data_types.sql",
  "unique_id": "macro.dbt.type_string",
  "macro_sql": "\n\n{%- macro type_string() -%}\n  {{ return(adapter.dispatch('type_string', 'dbt')()) }}\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": [
      "macro.dbt.default__type_string"
    ]
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719397302.694219,
  "supported_languages": null
}
{
  "name": "default__type_string",
  "resource_type": "macro",
  "package_name": "dbt",
  "path": "macros/utils/data_types.sql",
  "original_file_path": "macros/utils/data_types.sql",
  "unique_id": "macro.dbt.default__type_string",
  "macro_sql": "{% macro default__type_string() %}\n    {{ return(api.Column.translate_type(\"string\")) }}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719397302.694326,
  "supported_languages": null
}
{
  "name": "redshift__type_string",
  "resource_type": "macro",
  "package_name": "dbt_project_evaluator",
  "path": "macros/cross_db_shim/redshift_shims.sql",
  "original_file_path": "macros/cross_db_shim/redshift_shims.sql",
  "unique_id": "macro.dbt_project_evaluator.redshift__type_string",
  "macro_sql": "{%- macro redshift__type_string() -%}\n  {{ \"VARCHAR(600)\" }}\n{%- endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719397302.810622,
  "supported_languages": null
}

@stevenconnorg
Copy link
Author

Hey @b-per , here are the results from the jq command:

{
  "name": "type_string",
  "resource_type": "macro",
  "package_name": "dbt",
  "path": "macros/utils/data_types.sql",
  "original_file_path": "macros/utils/data_types.sql",
  "unique_id": "macro.dbt.type_string",
  "macro_sql": "\n\n{%- macro type_string() -%}\n  {{ return(adapter.dispatch('type_string', 'dbt')()) }}\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": [
      "macro.dbt.default__type_string"
    ]
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.668191,
  "supported_languages": null
}
{
  "name": "default__type_string",
  "resource_type": "macro",
  "package_name": "dbt",
  "path": "macros/utils/data_types.sql",
  "original_file_path": "macros/utils/data_types.sql",
  "unique_id": "macro.dbt.default__type_string",
  "macro_sql": "{% macro default__type_string() %}\n    {{ return(api.Column.translate_type(\"string\")) }}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.668301,
  "supported_languages": null
}
{
  "name": "type_string",
  "resource_type": "macro",
  "package_name": "dbt_profiler",
  "path": "macros/cross_db_utils.sql",
  "original_file_path": "macros/cross_db_utils.sql",
  "unique_id": "macro.dbt_profiler.type_string",
  "macro_sql": "\n\n{%- macro type_string() -%}\n  {{ return(adapter.dispatch(\"type_string\", macro_namespace=\"dbt_profiler\")()) }}\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": [
      "macro.dbt_profiler.default__type_string"
    ]
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.795249,
  "supported_languages": null
}
{
  "name": "default__type_string",
  "resource_type": "macro",
  "package_name": "dbt_profiler",
  "path": "macros/cross_db_utils.sql",
  "original_file_path": "macros/cross_db_utils.sql",
  "unique_id": "macro.dbt_profiler.default__type_string",
  "macro_sql": "{%- macro default__type_string() -%}\n  varchar\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.7953038,
  "supported_languages": null
}
{
  "name": "bigquery__type_string",
  "resource_type": "macro",
  "package_name": "dbt_profiler",
  "path": "macros/cross_db_utils.sql",
  "original_file_path": "macros/cross_db_utils.sql",
  "unique_id": "macro.dbt_profiler.bigquery__type_string",
  "macro_sql": "{%- macro bigquery__type_string() -%}\n  string\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.7953548,
  "supported_languages": null
}
{
  "name": "databricks__type_string",
  "resource_type": "macro",
  "package_name": "dbt_profiler",
  "path": "macros/cross_db_utils.sql",
  "original_file_path": "macros/cross_db_utils.sql",
  "unique_id": "macro.dbt_profiler.databricks__type_string",
  "macro_sql": "{%- macro databricks__type_string() -%}\n  string\n{%- endmacro -%}\n\n\n",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.795406,
  "supported_languages": null
}
{
  "name": "redshift__type_string",
  "resource_type": "macro",
  "package_name": "dbt_project_evaluator",
  "path": "macros/cross_db_shim/redshift_shims.sql",
  "original_file_path": "macros/cross_db_shim/redshift_shims.sql",
  "unique_id": "macro.dbt_project_evaluator.redshift__type_string",
  "macro_sql": "{%- macro redshift__type_string() -%}\n  {{ \"VARCHAR(600)\" }}\n{%- endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348795.9382882,
  "supported_languages": null
}
{
  "name": "edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.edr_type_string",
  "macro_sql": "\n\n\n{%- macro edr_type_string() -%}\n    {{ return(adapter.dispatch('edr_type_string', 'elementary')()) }}\n{%- endmacro -%}\n\n",
  "depends_on": {
    "macros": [
      "macro.elementary.postgres__edr_type_string"
    ]
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390201,
  "supported_languages": null
}
{
  "name": "default__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.default__edr_type_string",
  "macro_sql": "{% macro default__edr_type_string() %}\n    {# Redshift #}\n    {% do return(\"varchar(4096)\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390293,
  "supported_languages": null
}
{
  "name": "postgres__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.postgres__edr_type_string",
  "macro_sql": "{% macro postgres__edr_type_string() %}\n    {% if var(\"sync\", false) %}\n        {% do return(\"text\") %}\n    {% else %}\n        {% do return(\"varchar(4096)\") %}\n    {% endif %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390466,
  "supported_languages": null
}
{
  "name": "snowflake__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.snowflake__edr_type_string",
  "macro_sql": "{% macro snowflake__edr_type_string() %}\n    {# Default max varchar size in Snowflake is 16MB #}\n    {% do return(\"varchar\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.3905568,
  "supported_languages": null
}
{
  "name": "bigquery__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.bigquery__edr_type_string",
  "macro_sql": "{% macro bigquery__edr_type_string() %}\n    {# Default max string size in Bigquery is 65K #}\n    {% do return(\"string\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390642,
  "supported_languages": null
}
{
  "name": "spark__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.spark__edr_type_string",
  "macro_sql": "{% macro spark__edr_type_string() %}\n    {% do return(\"string\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390722,
  "supported_languages": null
}
{
  "name": "athena__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.athena__edr_type_string",
  "macro_sql": "{% macro athena__edr_type_string() %}\n    {% do return(\"varchar\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390801,
  "supported_languages": null
}
{
  "name": "trino__edr_type_string",
  "resource_type": "macro",
  "package_name": "elementary",
  "path": "macros/utils/data_types/data_type.sql",
  "original_file_path": "macros/utils/data_types/data_type.sql",
  "unique_id": "macro.elementary.trino__edr_type_string",
  "macro_sql": "{% macro trino__edr_type_string() %}\n    {% do return(\"varchar\") %}\n{% endmacro %}",
  "depends_on": {
    "macros": []
  },
  "description": "",
  "meta": {},
  "docs": {
    "show": true,
    "node_color": null
  },
  "patch_path": null,
  "arguments": [],
  "created_at": 1719348796.390879,
  "supported_languages": null
}

@b-per
Copy link
Collaborator

b-per commented Aug 20, 2024

With #475 implemented and released in the latest version, I think that this is not needed anymore. Let's open a new issue if there is still an issue.

@b-per b-per closed this Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants