Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-1062] [Bug] Retries on wait for result step is recreating the whole job #1045

Closed
2 tasks done
github-christophe-oudar opened this issue Dec 6, 2023 · 2 comments
Closed
2 tasks done
Assignees
Labels
bug Something isn't working

Comments

@github-christophe-oudar
Copy link
Contributor

github-christophe-oudar commented Dec 6, 2023

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Retries on wait for result step is recreating the whole BigQuery job which means it's duplicating the run.

It can lead to few problems:

  • if for some reason, the model is not 100% idempotent, it will lead to unexpected results like duplicated data
  • The incurred cost can be doubled
  • For long model, it can double the time to process and therefore add unexpected downstream latency/delay

It took us few hours to 2-3 engineers actually pinpoint the origin of the problem as we didn't expect that kind of behavior.
To workaround the issue, we use retries: 0 on the production connection be sure we always rerunning the whole models fully to be safe.

Expected Behavior

If a job has been started successfully, we should retry to connect to the actual job status and not create a new one.

Steps To Reproduce

This bug is hard to reproduce as it depends on potential Google's being in error or the network to fail on the request.
It might be mimicked by changing the code the code return an error instead of waiting for the result.

Relevant log output

2023-12-02T04:43:31.713+01:00	[0m03:43:31 BigQuery adapter: https://console.cloud.google.com/bigquery?project=prodcution&j=bq:US:11111111-2222-3333-4444-aaaaaaaaaaaa&page=queryresults

... 1,25 hour latter

2023-12-02T05:55:12.921+01:00	[0m04:55:12 BigQuery adapter: Retry attempt 1 of 3 after error: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))

2023-12-02T05:55:12.922+01:00	[0m04:55:12 BigQuery adapter: Reopening connection after ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))

2023-12-02T05:55:14.347+01:00	[0m04:55:14 BigQuery adapter: https://console.cloud.google.com/bigquery?project=teads-prod-analytics-etl&j=bq:US:11111111-2222-3333-4444-bbbbbbbbbbbb&page=queryresults

Environment

- OS: Ubuntu 20.04
- Python: 3.10
- dbt-core: 1.7.0
- dbt-bigquery: 1.7.0

Additional Context

No response

@github-christophe-oudar github-christophe-oudar added bug Something isn't working triage labels Dec 6, 2023
@github-actions github-actions bot changed the title [Bug] Retries on wait for result step is recreating the whole job [ADAP-1062] [Bug] Retries on wait for result step is recreating the whole job Dec 6, 2023
@dbeatty10 dbeatty10 removed the triage label Dec 7, 2023
@McKnight-42 McKnight-42 self-assigned this Dec 20, 2023
@McKnight-42
Copy link
Contributor

McKnight-42 commented Dec 20, 2023

Closing as this ties with #1042 and #977 which has a continuing conversation going on.

@github-christophe-oudar
Copy link
Contributor Author

Note that #977 approach could work too but only if you're rerunning the dbt command a second time.
I'm also wondering if it works or if it would fail saying there's an existing job with the provided id (since the tests are using mocks).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants