-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High Availability Considerations docs outdated #3052
Comments
I can create a PR but a sanity check on the above conclusions first would be appreciated |
Is this also relevant to SIG Docs? I can't tell. |
If |
This one which I think official docs link to (but not seeing the link at the moment) https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md |
yes, https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md ended up in this repo. the link to it is here:
|
If we can move that doc into the main website, I think that benefits our users. Not required though. |
moving it might break existing URL references. also frankly, i would prefer if kubeadm docs eventually start moving in the other direction - i.e. to this repo. if this repo starts hosting kubeadm source code and it has versioned branches. the kubeadm docs can be hosted similarly to other projects with netlify / hugo: but that's a discussion for another time. |
From first glance (have been unable so far to thoroughly review things in my development environment): it is true that HAProxy has undergone some changes leading to different configuration files. If this turns out to be an issue here, then providing two sets of example configuration would make sense. As the rest of the report is concerned I will need to take a closer look. Improvements are always welcome. I hope to be able to do this on the weekend. |
thanks @mbert |
Let's see. For now still both versions are used in the field, because the older one is still present in EL distros. |
that's a good point. users sometimes just install from the distro packaging. |
I have now had some time to read through everything. First of all: I totally agree with @nijave - the guide is outdated here, and I think a PR with the proposed changes would be very welcome. Actually the examples in the guide were, IIRC, created using HAProxy 1.8 on an EL7 platform, and given the changes in HAProxy since then the configuration example should really get updated. Since what I still have (I am not actively using the setup in my environment, hence experimenting would require me setting things up again before) is all based on that "ancient" HAProxy I cannot quickly provide a configuration for version 2.1, because I have never had one in use. Long story short: I propose to provide the configuration example for 2.8 as seen above (assuming that it has been tested and works) along with mentioning the version and the fact that this may not work for other versions. Regarding the health check: again, I totally agree that the HAProxy should be checked on all nodes, not only the one with the VIP. Thank you for noticing! |
@nijave would you send a PR for this as you suggested earlier? |
I'm running HA Proxy 2.4 (default in Ubuntu 22.04) without any custom healthcheck setups - and it works just fine:
I've been running an HA Proxy loadbalancer in front of my K8s clusters with similar configurations for at least four years, on several clusters, never had any problems. |
Yeah, give me a few days |
I was able to get a working setup when using the current guide, however it didn't handle failures scenarios correctly (so it was load balanced, but not really highly available). With your setup, it doesn't look like you're checking api-server health, only if it accepts TCP connections. If api-server health check is failing, haproxy will still route traffic to those instances. An easy test is killing etcd on a node and observing api-server is still running but returning an error code for /healthz (in which case it should be removed from active backends in haproxy) |
You're probably right. At home I have a bare metal cluster, and I routinely update the nodes, meaning that Kubernetes will be shutdown during reboots. So far HA Proxy have handled this situation without any hickups - but let me have a look when I return home. |
Did some further testing: This works for me.
|
Poked around and it looks like Ubuntu and recent version of RHEL (clones) are on HAProxy v2.4 (LTS) which appears to also work with the config I mentioned above OS - version (EOL date for community support) RHEL 7 - v1.5.18 (30 Jun 2024) Ubuntu 20.04 - v2.0.33 (02 Apr 2025) |
Official documentation is incorrect on cent stream9
When i run
|
What do you mean by "incorrect"? Your output mentions a warning but also says "Configuration file is valid" |
My bad and my wording was a bit severe. What I mean is that there are alarm messages in the official configuration, which seems a bit disturbing, especially to newbies |
@PunkFleet if you remove the |
Yup, It's vaild and not any warning now |
if you find the warning bothersome you can help us by sending a PR for the document. |
I'd open a new issue and ideally PR. When I did the updates earlier, I was trying to do the minimal to get certain failure scenarios working and didn't validate all the config (I've never used haproxy before so don't have much experience with it) Looking over haproxy docs, seems |
Is this a BUG REPORT or FEATURE REQUEST?
Choose one: BUG REPORT
Versions
kubeadm version (use
kubeadm version
):Environment:
kubectl version
):Ubuntu VMs
uname -a
):cri-o
Calico
What happened?
haproxy was still including nodes which returned non 200 health checks. I attempted to troubleshoot but there's been significant changes since haproxy v2.1 so documentation isn't readily available. It seems most likely the health check was running over HTTP (plaintext) and ignoring the returned 400 error code. In addition, haproxy v2.1 is no longer officially supported.
I also observed the very low timeouts in haproxy lead to frequent termination of
kubectl ... -w
andkubectl logs -f
Edit: I think it may also be possible that
ssl-hello-chk
option in the guide is overridinghttpchk
which would also explain the behavior I was seeing. https://cbonte.github.io/haproxy-dconv/2.1/configuration.html#5.2-checkhttps://endoflife.date/haproxy
What you expected to happen?
The guide to provide a setup using software still supported (patched) by the vendor
How to reproduce it (as minimally and precisely as possible)?
Attempt to use config in haproxy v2.8 docs (currently the default LTS) and it fails due to syntax changes
Anything else we need to know?
For keepalived check, I ended up with
It's unclear in the guide why haproxy is only being health checked if it's running on the VIP. The guide configuration could allow keepalived to move the VIP to a node with working API server and broken haproxy which will fail and it will be moved again. Additionally, it doesn't check the health endpoint so the VIP could be moved to a node that's misconfigured but still responds to requests.
For haproxy 2.8, I ended up with the following (static pod)
I don't know much about haproxy, but I'm not sure how the health check worked before unless haproxy v2.1 didn't validate certificates by default.
I reached these conclusions with the following tests:
The text was updated successfully, but these errors were encountered: