Using Kubernetes with a cloud provider is quite magical. Many things just work.
On bare metal however, it involves less magic and more work ;p
(See NGINX Ingress Controller - Bare-metal considerations)
To route inbound traffic to the ingress, we've chosen to use a non-containerized NGinx Reverse Proxy.
It matches the Using a self-provisioned edge section of the documentation.
This NGinx RP is used for :
- SSL termination
- Authentication when needed
- Basic load balancer to the various nodes
It also provides error logs parsed by Crowdsec.
The NGinx Ingress is exposed on a dedicated IP provided by MetalLB and the RP merely forwards requests to this port.
Kubernetes doesn't provide a dynamic storage provider on bare metal installations.
To keep things simple, you can use a hostPath provisioner. However, it statically binds a path to a container, so it's far from a dynamic system.
NFS would also be a simple solution, however that would be yet another SPOF in the system, and it would offer poor IO.
On the opposite side of the spectrum, you can use a network filesystem with a dynamic provider (GlusterFS and CephFS), but for a small home installation it's probably overkill (more on that later ;p).
To try and find a middle ground, the local type allows the static creation of Volumes, and a dynamic claim from pods.
Scripting the creation of local volumes is trivial (mkdir
and kubectl
). However, using plain folders means that the capacity
attribute of volumes is not enforced. A single rogue container can fill the whole host filesystem that contain the volume.
To have a more robust system, we've used LVM LogicalVolumes.
The ansible playbook creates a dedicated LV for each Volume. That way, a container can't use more than the allocated size.
Using LVs mean that we can easily extend an existing volume if the space requirement increases over time.
As we create all the Volumes before creating the Persistent Volume Claims, we can't be sure that the prepared volumes are bound to the correct pod.
To make sure of it, we use labels.
Volumes are tagged in the same way other Kubernetes objects are, using the usual app
and tier
labels.
For example, if we have a stateful application that has both a frontend and MySQL DB, the pods and associated volumes will have the tags :
app.kubernetes.io/name: my-awesome-app
,app.kubernetes.io/component: my-awesome-app
app.kubernetes.io/name: my-awesome-app
,app.kubernetes.io/component: mysql
The Persistent Volume Claim will use these same labels as selectors :
volumeClaimTemplates:
- metadata:
name: my-awesome-app-pv-claim
spec:
[...]
selector:
matchLabels:
app.kubernetes.io/name: my-awesome-app
app.kubernetes.io/component: my-awesome-app
We can't really schedule all our applications on any Kubernetes node.
- Some applications require dedicated hardware on the machine (Home automation requires USB sticks plugged in)
- Some node are far less powerful (video transcoding on a Atom CPU is not a good idea)
To avoid those issues without hard-pinning the applications on dedicated nodes, we use a looser tagging system.
Nodes are tagged according to their capabilities. For example :
labels:
capability/general-purpose: "yes"
capability/home: "no"
Accordingly, the pods use a NodeSelector :
nodeSelector:
capability/general-purpose: 'yes'
As I said earlier, a distributed storage solution is probably overkill.
But that shouldn't stop us from using it !
Ideally, GlusterFS will be deployed using the new Gluster Container Storage project.
It uses an Operator to automatically deploy Glusterd2 containers, a dynamic volume provisioner, a Prometheus exporter ... pretty much the whole stack.
But it's not stable yet.
A standalone GlusterFS cluster with Heketi is another solution.
However, Heketi has a few requirements that I don't particularly like.
I would rather avoid having a container that can SSH to all my nodes, and then password-lessly sudo whatever they want.
However, the GlusterFS volume type has been deprecated in Kubernetes 1.25 :/
Longhorn is an easy solution to manage distributed storage on Kubernetes.