Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-453] [Feature] Overwrite api.Column.string_type #665

Open
3 tasks done
rlh1994 opened this issue Apr 13, 2023 · 6 comments
Open
3 tasks done

[ADAP-453] [Feature] Overwrite api.Column.string_type #665

rlh1994 opened this issue Apr 13, 2023 · 6 comments
Labels
enhancement New feature or request help_wanted Extra attention is needed

Comments

@rlh1994
Copy link

rlh1994 commented Apr 13, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Currently the core dbt api.Column.string_type is not overwritten, however the default value is not suitable for bigquery data types, which I believe only supports the string type.

Describe alternatives you've considered

Using dbt.type_string, but this isn't suitable in all usecases, see here dbt-labs/dbt-core#7103

Who will this benefit?

Cross-warehouse users.

Are you interested in contributing this feature?

No response

Anything else?

No response

@rlh1994 rlh1994 added enhancement New feature or request triage labels Apr 13, 2023
@github-actions github-actions bot changed the title [Feature] Overwrite api.Column.string_type [ADAP-453] [Feature] Overwrite api.Column.string_type Apr 13, 2023
@dbeatty10
Copy link
Contributor

Thanks for opening this issue @rlh1994 !

I know we discussed this briefly here, but could you provide step-by-step instructions to reproduce the error you are experiencing?

@rlh1994
Copy link
Author

rlh1994 commented Apr 13, 2023

Yeah sure, sorry. As a heads up as well this issue is the same in databricks (spark I assume) as well - do you want me to raise a separate issue in that adaptors?

The easiest way to see it is with a basic model that just casts something to the string type:

{{
  config(
    materialized = 'table',
    )
}}

select
    cast(10 as {{api.Column.string_type(4000)}}) as test_col

Which in the case of a BigQuery target leads to the following compiled code



select
    cast(10 as character varying(4000)) as test_col

and a dbt run gives the error output

 dbt run --target bigquery
17:51:06  Running with dbt=1.4.5
17:51:06  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 337 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
17:51:06  
17:51:07  Concurrency: 1 threads (target='bigquery')
17:51:07  
17:51:07  1 of 1 START sql table model dbt_ryan.test_models .............................. [RUN]
17:51:09  BigQuery adapter: https://console.cloud.google.com/bigquery?project=***&page=queryresults
17:51:09  1 of 1 ERROR creating sql table model dbt_ryan.test_models ..................... [ERROR in 1.85s]
17:51:09  
17:51:09  Finished running 1 table model in 0 hours 0 minutes and 2.48 seconds (2.48s).
17:51:09  
17:51:09  Completed with 1 error and 0 warnings:
17:51:09  
17:51:09  Database Error in model test_models (models/test_models.sql)
17:51:09    Syntax error: Expected ")" but got identifier "varying" at [14:26]
17:51:09    compiled Code at target/run/dbt_demo/models/test_models.sql
17:51:09  
17:51:09  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Because there is no overwritten function, it uses a the core class, which is character varying which doesn't exist in BigQuery, so this api call just errors in bigquery targets.

I could swap out the api.Column.string_type(4000) for type_string() as no one really uses length limits in BigQuery and the default string type is already max size, but as in the other issue type_string() doesn't give the option to specify the length which generates a 256 length string for Redshift, so I can't build a model (or macro) that is easily cross-warehouse and instead have to split the models or dispatch the macro.

@dbeatty10
Copy link
Contributor

dbeatty10 commented Apr 13, 2023

Thanks @rlh1994 -- that is a perfect example 🏆

Based on the documentation for the string_type static class method here, we'd expect the following query to work across adapters:

select cast(10 as {{ api.Column.string_type(4000) }})

Suspected root cause

The primary testing for these two methods appears tautological to me (rather than actually testing platform-specific data types).

Next steps

On my side, I've created an issue in dbt-core to add functional tests for the string_type and numeric_type adapter class methods. From there, the implementation of the adapter-specific method(s) should inherit the (new) functional tests from dbt-core to confirm the fix.

Could you raise an issue in dbt-spark as well? Once the fix is merged, dbt-databricks will inherit it.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 11, 2023
@rlh1994
Copy link
Author

rlh1994 commented Oct 11, 2023

Nope

@dbeatty10 dbeatty10 removed the Stale label Oct 11, 2023
@dbeatty10
Copy link
Contributor

@rlh1994 I just removed the Stale label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants