-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gateway preStop hook #1200
Fix gateway preStop hook #1200
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: norbjd The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1200 +/- ##
=======================================
Coverage 80.98% 80.98%
=======================================
Files 18 18
Lines 1499 1499
=======================================
Hits 1214 1214
Misses 223 223
Partials 62 62 ☔ View full report in Codecov by Sentry. |
/retest |
I think I'd prefer the approach that @skonto proposed here: #1118 (comment). Using the health-checks as is is not really doing a proper graceful shutdown and draining existing connections. /hold |
Yep, I completely agree, but this PR was more about fixing the existing behavior (hook is not doing what it is supposed to do right now, because However, if you think this PR is an useless step and we should go directly to the target (rewrite that hook and correctly handle draining) then that's fine, just tell me, and I'll close the PR and directly work on draining 👍 I've started something here: #1118 (comment), but I need to do more testing. |
Closing, superseded by #1203. |
Changes
/kind bug
While investigating to understand the issue #1118, I've noticed that the current
preStop
hook for the gateway (envoy) did not work at all. The reason is thatcurl
is not installed in Envoy's default image (it might have been here at some point, but not anymore). As a result, the call to/healthcheck/fail
, to make Envoy fail health checks before shutting down, did not work.To replace
curl
, there were many solutions I've thought of:curl
in it: too overkill for our needs (I don't think we want to maintain a separate Envoy image just for a hook). The ideal solution would have been to havecurl
installed in the envoy official image, but we can't expect thatcurl
in an init container and copy it to envoy container using a volume: this looks way too magic. The reason we need a static binary is we can't just copy thecurl
binary fromapt install -y curl
because this needs some other dependenciescurl
, e.g. installnc
orsocat
(or other tools I'm not aware of) to be able to communicate with the/tmp/envoy.admin
socketI've chosen 4 because it seemed to be the "least" impactful change (having an init container to install a binary used only in the hook is... meh). To communicate with Envoy admin interface using default shell tools, I had to "open" the route to
/healthcheck/fail
endpoint. We need to pass through the HTTP endpoint because we can't write directly to the socket.It was not really easy to see this error, as no
FailedPreStopHook
event were sent, since the last command of the hook (sleep 15
) did succeed. This is why I've also added aset -e
command in the hook: if one command fails in the script, it will return an error, and throw aFailedPreStopHook
.Alternative
An other alternative (simpler) is to only keep the pause (
sleep 15
) in thepreStop
hook, like in this tutorial. But I'm not sure that in this case, when the pod is inTerminating
state, no traffic will be redirected to it. Let's discuss in the comments if you feel this is a better solution.Release Note