Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] dbt seed is not accepting custom delimiter in the seed configs #1352

Closed
2 tasks done
roshravoof opened this issue Sep 20, 2024 · 5 comments
Closed
2 tasks done
Assignees
Labels
bug Something isn't working regression

Comments

@roshravoof
Copy link

roshravoof commented Sep 20, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

dbt seed is not accepting custom/pipe delimiter in the seed configs

seeds:
  - name: mappings
    config:
      delimiter: '|'

Above seed config doesnt work in dbt version 1.7.18

Expected Behavior

dbt seed should accept any custom or multiple delimiters in the seed configs. dbt seed should be able to process comma and pipe delimited files in the same project.

Steps To Reproduce

setup dbt version 1.7.18 and python version 3.11

Setup seed config

seeds:
  - name: mappings
    config:
      delimiter: '|'

Relevant log output

No response

Environment

- Python: 3.11
- dbt: 1.7.18

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

@roshravoof roshravoof added bug Something isn't working triage labels Sep 20, 2024
@dbeatty10
Copy link
Contributor

Thanks for reporting this @roshravoof !

I was able to replicate what you described.

This only looks like it affects dbt-bigquery (and not dbt-postgres, dbt-snowflake, etc), so I'm going to transfer this issue to the dbt-bigquery repo.

Reprex

Create these files:

seeds/mappings.csv

id|alpha
1|A
2|B
3|C

seeds/_seeds.yml

seeds:
  - name: mappings
    config:
      delimiter: '|'

Run these commands:

dbt seed

See this output in dbt 1.6:

$ dbt seed
12:47:00  Running with dbt=1.6.5
12:47:34  Registered adapter: bigquery=1.6.9
12:47:34  Unable to do partial parsing because saved manifest not found. Starting full parse.
12:47:35  Found 1 model, 1 seed, 0 sources, 0 exposures, 0 metrics, 394 macros, 0 groups, 0 semantic models
12:47:35  
12:47:59  Concurrency: 10 threads (target='blue')
12:47:59  
12:47:59  1 of 1 START seed file dbt_dbeatty.mappings .................................... [RUN]
12:48:05  1 of 1 OK loaded seed file dbt_dbeatty.mappings ................................ [INSERT 3 in 5.95s]
12:48:05  
12:48:05  Finished running 1 seed in 0 hours 0 minutes and 30.38 seconds (30.38s).
12:48:05  
12:48:05  Completed successfully
12:48:05  
12:48:05  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

See this output in dbt 1.7 and 1.8:

$ dbt seed           
12:48:57  Running with dbt=1.7.11
12:48:59  Registered adapter: bigquery=1.7.8
12:48:59  Unable to do partial parsing because saved manifest not found. Starting full parse.
12:49:00  Found 1 model, 1 seed, 0 sources, 0 exposures, 0 metrics, 454 macros, 0 groups, 0 semantic models
12:49:00  
12:49:33  Concurrency: 10 threads (target='blue')
12:49:33  
12:49:33  1 of 1 START seed file dbt_dbeatty.mappings .................................... [RUN]
12:49:36  1 of 1 ERROR loading seed file dbt_dbeatty.mappings ............................ [ERROR in 3.50s]
12:49:36  
12:49:36  Finished running 1 seed in 0 hours 0 minutes and 36.08 seconds (36.08s).
12:49:36  
12:49:36  Completed with 1 error and 0 warnings:
12:49:36  
12:49:36    Runtime Error in seed mappings (seeds/mappings.csv)
  Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 0; errors: 3; max bad: 0; error percent: 0
  Error while reading data, error message: CSV table references column position 1, but line contains only 1 columns.; line_number: 2 byte_offset_to_start_of_line: 9 column_index: 1 column_name: "alpha" column_type: STRING
  Error while reading data, error message: CSV table references column position 1, but line contains only 1 columns.; line_number: 3 byte_offset_to_start_of_line: 13 column_index: 1 column_name: "alpha" column_type: STRING
  Error while reading data, error message: CSV table references column position 1, but line contains only 1 columns.; line_number: 4 byte_offset_to_start_of_line: 17 column_index: 1 column_name: "alpha" column_type: STRING
  You are loading data without specifying data format, data will be treated as CSV format by default. If this is not what you mean, please specify data format by --source_format.
12:49:36  
12:49:36  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core Sep 23, 2024
@dbeatty10 dbeatty10 removed the triage label Sep 23, 2024
@dbeatty10 dbeatty10 changed the title [Bug] dbt seed is not accepting custom delimiter in the seed configs [Regression] dbt seed is not accepting custom delimiter in the seed configs Sep 23, 2024
@simbazzuk
Copy link

@dbeatty10 is this still an issue as it works with the latest version. Anything I can do on this issue?

@colin-rogers-dbt colin-rogers-dbt self-assigned this Oct 11, 2024
@colin-rogers-dbt
Copy link
Contributor

Took a look at this, what's interesting is that without the it seems to work if you do it in dbt_project.yml like:

seeds:
  jaffle_shop:
      mappings:
           config:
              delimiter: '|'

@colin-rogers-dbt
Copy link
Contributor

Will investigate what/how dbt-bigquery is handling this differently

@colin-rogers-dbt
Copy link
Contributor

colin-rogers-dbt commented Oct 21, 2024

So after much investigating it's not clear exactly what broke this functionality in 1.7 but I can confirm it works in 1.6. This was already being fixed (see #1122) in the upcoming 1.9 release but we'll look at backporting to 1.8 as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression
Projects
None yet
Development

No branches or pull requests

4 participants