Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Concurrency issues in snowflake incremental models. #1123

Open
2 tasks done
rattata2me opened this issue Jul 18, 2024 · 3 comments
Open
2 tasks done

[Feature] Concurrency issues in snowflake incremental models. #1123

rattata2me opened this issue Jul 18, 2024 · 3 comments

Comments

@rattata2me
Copy link

Is this a new bug in dbt-snowflake?

  • I believe this is a new bug in dbt-snowflake
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running an incremental table with the on_schema_change policy set to append_new_columns, a race condition can occur. If two jobs perform the column check at the same time, the same ALTER TABLE statement will be generated, resulting in a SQL compilation error on the slower executor:

SQL Compilation error:
ERROR
  column '<new_column>' already exists

The core issue is that we cannot ensure the column schema remains unchanged between the column check operation and the ALTER TABLE statement.

Expected Behavior

To avoid this issue, the process should either utilize the IF NOT EXISTS and IF EXISTS conditions provided by Snowflake or make use of transactions so that column checkup and update is atomic.

Steps To Reproduce

  1. Create an Incremental Table:
    Define and create an incremental table in your dbt project.
  2. Add a Column:
    Add a new column entry to the model.
  3. Concurrently Execute dbt run:
    Execute the dbt process in two separate jobs concurrently.

Relevant log output

No response

Environment

- OS:Ubuntu 22.04
- Python: 3.9
- dbt-core: 1.5.7
- dbt-snowflake: 1.5.7

Additional Context

No response

@amychen1776
Copy link

@rattata2me Could you say more about your use case here? Is there a reason why you would want to have concurrent execution against the same target (especially given that you are changing the target table schema with a new column).

@rattata2me
Copy link
Author

@rattata2me Could you say more about your use case here? Is there a reason why you would want to have concurrent execution against the same target (especially given that you are changing the target table schema with a new column).

I have to dynamically update the table contents every time there is a new data ingestion, the data ingestion can be concurrent; coming from different users at the same time. This is a critical process which has to have 100% up time so I depend on the logic of on schema change to roll out column changes while keeping the service running.

@amychen1776
Copy link

Thank you for the explanation!

@rattata2me rattata2me changed the title [Bug] Concurrency issues in snowflake incremental models. [Feature] Concurrency issues in snowflake incremental models. Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants