Skip to content

Latest commit

 

History

History
135 lines (79 loc) · 5.35 KB

Bare_metal_considerations.md

File metadata and controls

135 lines (79 loc) · 5.35 KB

Bare metal

Using Kubernetes with a cloud provider is quite magical. Many things just work.

On bare metal however, it involves less magic and more work ;p

Network

(See NGINX Ingress Controller - Bare-metal considerations)

To route inbound traffic to the ingress, we've chosen to use a non-containerized NGinx Reverse Proxy.

It matches the Using a self-provisioned edge section of the documentation.

Architecture - Incoming web traffic

This NGinx RP is used for :

  • SSL termination
  • Authentication when needed
  • Basic load balancer to the various nodes

It also provides error logs parsed by Crowdsec.

The NGinx Ingress is exposed on a dedicated IP provided by MetalLB and the RP merely forwards requests to this port.

Architecture - Incoming web traffic - Zoom on ReverseProxy

Storage

Kubernetes doesn't provide a dynamic storage provider on bare metal installations.

Many bare metal options

To keep things simple, you can use a hostPath provisioner. However, it statically binds a path to a container, so it's far from a dynamic system.

NFS would also be a simple solution, however that would be yet another SPOF in the system, and it would offer poor IO.

On the opposite side of the spectrum, you can use a network filesystem with a dynamic provider (GlusterFS and CephFS), but for a small home installation it's probably overkill (more on that later ;p).

To try and find a middle ground, the local type allows the static creation of Volumes, and a dynamic claim from pods.

On to local volumes

Scripting the creation of local volumes is trivial (mkdir and kubectl). However, using plain folders means that the capacity attribute of volumes is not enforced. A single rogue container can fill the whole host filesystem that contain the volume.

To have a more robust system, we've used LVM LogicalVolumes.

The ansible playbook creates a dedicated LV for each Volume. That way, a container can't use more than the allocated size.

Using LVs mean that we can easily extend an existing volume if the space requirement increases over time.

Local volume tagging

As we create all the Volumes before creating the Persistent Volume Claims, we can't be sure that the prepared volumes are bound to the correct pod.

To make sure of it, we use labels.

Volumes are tagged in the same way other Kubernetes objects are, using the usual app and tier labels.

For example, if we have a stateful application that has both a frontend and MySQL DB, the pods and associated volumes will have the tags :

  • app.kubernetes.io/name: my-awesome-app, app.kubernetes.io/component: my-awesome-app
  • app.kubernetes.io/name: my-awesome-app, app.kubernetes.io/component: mysql

The Persistent Volume Claim will use these same labels as selectors :

volumeClaimTemplates:
- metadata:
    name: my-awesome-app-pv-claim
  spec:
    [...]
    selector:
      matchLabels:
        app.kubernetes.io/name: my-awesome-app
        app.kubernetes.io/component: my-awesome-app

Host tagging

We can't really schedule all our applications on any Kubernetes node.

  • Some applications require dedicated hardware on the machine (Home automation requires USB sticks plugged in)
  • Some node are far less powerful (video transcoding on a Atom CPU is not a good idea)

To avoid those issues without hard-pinning the applications on dedicated nodes, we use a looser tagging system.

Nodes are tagged according to their capabilities. For example :

labels:
  capability/general-purpose: "yes"
  capability/home: "no"

Accordingly, the pods use a NodeSelector :

nodeSelector:
  capability/general-purpose: 'yes'

GlusterFS (deprecated)

As I said earlier, a distributed storage solution is probably overkill.

But that shouldn't stop us from using it !

Long term target

Ideally, GlusterFS will be deployed using the new Gluster Container Storage project.

It uses an Operator to automatically deploy Glusterd2 containers, a dynamic volume provisioner, a Prometheus exporter ... pretty much the whole stack.

But it's not stable yet.

Heketi

A standalone GlusterFS cluster with Heketi is another solution.

However, Heketi has a few requirements that I don't particularly like.

I would rather avoid having a container that can SSH to all my nodes, and then password-lessly sudo whatever they want.

Deprecation

However, the GlusterFS volume type has been deprecated in Kubernetes 1.25 :/

Longhorn

Longhorn is an easy solution to manage distributed storage on Kubernetes.