Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic reconciliation between executor RunState and kubernetes #2604

Merged
merged 3 commits into from
Jun 23, 2023

Conversation

JamesMurkin
Copy link
Contributor

@JamesMurkin JamesMurkin commented Jun 23, 2023

We can get into the state where the executor RunState thinks there is an active run - but there is no pod backing the run

The result of this is that the run can never finish (as there is no pod) and will stay in Pending/Running forever

This PR just adds some basic reconciliation so if there is no pod backing the RunState then an action will be taken

┆Issue is synchronized with this Jira Task by Unito

This is largely a noop with a few minor changes

 - Move logging to happen before action is taken
 - Add logging for when issues self resolve
 - Don't break out of detectPodIssues - so we can detect more than 1 issue per round
 -
We can get into the state where the executor RunState thinks there is an active run - but there is no pod backing the run

The result of this is that the run can never finish (as there is no pod) and will stay in Pending/Running forever

This PR just adds some basic reconciliation so if there is no pod backing the RunState then an action will be taken
@JamesMurkin JamesMurkin marked this pull request as ready for review June 23, 2023 14:25
@codecov
Copy link

codecov bot commented Jun 23, 2023

Codecov Report

Patch coverage: 78.21% and project coverage change: +0.04 🎉

Comparison is base (c6466da) 58.60% compared to head (d4ae9c7) 58.65%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2604      +/-   ##
==========================================
+ Coverage   58.60%   58.65%   +0.04%     
==========================================
  Files         236      236              
  Lines       30357    30458     +101     
==========================================
+ Hits        17791    17865      +74     
- Misses      11199    11221      +22     
- Partials     1367     1372       +5     
Flag Coverage Δ
armada-server 58.65% <78.21%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
internal/executor/application.go 7.44% <0.00%> (-0.06%) ⬇️
internal/executor/service/pod_issue_handler.go 75.00% <78.57%> (-0.44%) ⬇️
internal/executor/job/job_run_state_store.go 73.86% <100.00%> (+0.45%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@JamesMurkin JamesMurkin merged commit d59732a into master Jun 23, 2023
@JamesMurkin JamesMurkin deleted the reconcile_executor_state branch June 23, 2023 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants