Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skipping health check on nodes if EC2 returns throttling errors #485

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

haouc
Copy link
Contributor

@haouc haouc commented Oct 21, 2024

Issue #, if available:

Description of changes:
The controller shouldn't health check on resources when the upstream API returns some specific errors, such as EC2 throttling, since those errors are not recoverable by failing the controller.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@haouc haouc requested a review from a team as a code owner October 21, 2024 02:09
Copy link
Contributor

@yash97 yash97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🚀

pkg/node/manager/manager.go Show resolved Hide resolved
@@ -96,6 +98,8 @@ type AsyncOperationJob struct {
nodeName string
}

const pausingHealthCheckDuration = 30 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it better to try a exponential back off strategy here rather than wait for a fixed amount of time?

Most customers would recover from the failure pretty quickly I assume, we might not want to block health check for 30 mins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to avoid more complicate exponential back off for this simple and safe skipping health check. How about 10 minutes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants