Skip to content

chr0n1x/rpi-talos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Talos on RPi

rpi k8s cluster via talos!

In image above: 7 RPis, 3 control, 3 workers, 1 dedicated pi-hole (on wifi lulz), zimablade x86 worker attached to a 10TB HDD

  • Leveraging talos/k8s with the pi.hole stack.
  • Talos is pretty cool.
  • Pretending to be a 1337 dev rack operator with all the smex appeal.
  • Playground for tinkering w/ on-prem bare-metal clusters; eventually want to start adding various node types for different workloads (e.g.: CUDA cores)
  • K8s playground because I'm bad.
  • Using cluster to run personal automations (plant waterer, ambient sounds via GitHub activity, etc)
  • Playground for moar app development, AI/ML/mining.
  • control & automate all installations of core services via helmfile

DISCLAIMER: also using this to learn K8s so things are going to look dumb.

Prerequisites

Acknowledge Caveats

DO NOT EXPOSE THIS SETUP TO THE WWW

I currently have some DNS names that are hard-coded in the ingresses for the bootstrapped services in this setup.

Specifically:

  • home.k8s points to a subset of my k8s nodes (for general cluster access)
  • I access everything via NodePort -- didn't want to setup TLS/SSL & a reverse proxy (yet)
  • all traffic is accessed via the kubernetes ingress-nginx controller
    • https is on 30443
    • http is on 30080
  • all services are behind CNAMEs to home.k8s
    • the dashboard is at https://dashboard.home.k8s:30443
    • longhorn is at https://longhorn.home.k8s:30443
    • etc

this doesn't sit will with me either so stay tuned for an actual solution that's...close to fully automated. Maybe.

That being said, if you're just starting out and/or have your own vLAN and way of isolating cluster instances/machines, everything should work out of the box here if you have the local DNS setup described above. So specifically, your DNSMasq rules should be something like:

cname=longhorn.home.k8s,home.k8s
cname=dashboard.home.k8s,home.k8s
cname=prometheus.home.k8s,home.k8s
cname=grafana.home.k8s,home.k8s
cname=docker.home.k8s,home.k8s

and home.k8s is an A-record to your control-node IPs. This can easily be setup via Pi.hole and its built-in local DNS settings/feature.

Hardware

  • A couple of machines that you can install TalosOS on.
    • Ideally >= RPi Model 3B
  • A couple of micro SD cards for these machines. Or if you've got bank, RPi HATs with harddrives attached.
    • if you have EXTRA harddrives on any machines, make sure that they're formatted appropriately. Check out the make.format-disc target.
  • A primary work machine:
    • should be able to flash your boot devices
    • either dd or rufus for you windows folk
  • it helps IMMENSELY if you have a small screen that you can attach/detach quickly for your RPis. I use this one.
    • make sure you have the right cables

The rest is really up to you. You can get fancy with switches, if you're using proxmox you can set up your own vLAN with reserved IPs for VMs. Whatever.

Software (Prep/Configuration)

  • For these machines make sure that you can address them via static IPs and/or hostnames on your lan.
    • if you're already running pi-hole you can do this via the Local DNS Configuration Page
    • if you just want something up and running, you can reserve IPs in your router DHCP settings
  • Whether you're imaging an ISO onto a bootable USB drive, SD card, DVD, whatever - you will need the appropriate image.
  • make sure that you have the appropriate tools installed. Running make in this repository will do a check for you, and give you more pointers on how to install anything that's missing.

Initial Setup

As said above, I HIGHLY recommend running at least Raspberry Pi Model 3B for your cluster. It seems to to have specs that result in smoother boot-up speeds, respnsiveness, etc. I personally have Model 4Bs and the overall experience has been very smooth.

Steps:

  1. flash the image for your machines onto their respective devices.

    • for *nix folks:
      • determine your <device blk/name> based on lsblk BE CAREFUL HERE, YOU CAN WIPE YOUR MAIN DRIVE LOL
      •   wget https://factory.talos.dev/image/f8a903f101ce10f686476024898734bb6b36353cc4d41f348514db9004ec0a9d/v1.7.5/metal-arm64.raw.xz
          xz -d metal-arm64.raw.xz
          dd if=metal-arm64.raw of=<device blk/name> conv=fsync bs=4M
          eject <device blk/name>
        
      • might have to sudo for the above
  2. export TALOSCONFIG=$(pwd)/talos/talosconfig

    • this will make the next couple of steps easier
    • if you use direnv, cp envrc.sample .envrc and direnv allow to make this automatic in the future
  3. pick a pretty pretty VIP machine. Actually any machine will do. This node/IP will be your first K8s control node.

    • at this point we're really just following the guide <- AT LEAST SKIM OVER THIS
    • cd talos -- so that everything generated ends up in this dir
    • define your VIP control endpoint. e.g.: https://<VIP Node IP>:6443
    • figure out a name that you want to give this cluster. e.g.: <cluster-name>
    • talosctl gen config <cluster-name> https://<VIP Node IP>:6443
  4. update the install device on the node:

    # THIS IS APPLICABLE FOR BOTH worker.yaml AND controlplane.yaml !
    machine:
        # ...
        # the stuff from above
        # ...
        # this already exists
        install:
            disk: /dev/mmcblk0 # this should be your boot device. find it via `talosctl -n <node ip> disks`
            image: ghcr.io/siderolabs/installer:v1.7.5
            wipe: false
  5. OPTIONAL: for longhorn, update your generated worker config to include the required mounts (I recommened cp worker.yaml worker-longhorn.yaml for these kinds of changes)

    machine:
        kubelet:
            extraMounts:
              # separate reserved mount for longhorn itself
              # this will use the machine's local disk for longhorm PVs/PVCs
              - destination: /var/lib/longhorn
                type: bind
                source: /var/lib/longhorn
                options:
                  - bind
                  - rshared
                  - rw
    
    
              # EVERYTHING BELOW IS OPTIONAL IF YOU HAVE AN EXTERNAL HDD
              # and AGAIN - make sure it's formatted appropriately with something
              # akin to the make.format-disc target
              - destination: /mounts/storage
                type: bind
                source: /mount/storage
                options:
                  - bind
                  - rshared
                  - rw
        disks:
          # path fetched via
          # talosctl -n <node IP> disks
          - device: /dev/sda
            partitions:
              - mountpoint: /mount/storage
  6. Bootstrap your first node:

    talosctl apply-config --insecure \
        --nodes <Node IP> \
        --file controlplane.yaml
    
    talosctl bootstrap --nodes <Node IP> --endpoints <Node IP> \
      --talosconfig=./talosconfig
  7. Wait for your first control node to become healthy/ready in the dmesg logs (the screen)

And at this point, you just have to apply the machine configs to the various machines!

talosctl --nodes <comma-delimeted list of controlplane node IPs> \
    apply-config -f controlplane.yaml --insecure


talosctl --nodes <comma-delimeted list of WORKER node IPs> \
    apply-config -f worker.yaml --insecure

and to play around with it, talosctl kubeconfig --nodes <Node IP> --endpoints <Node IP> to merge this cluster kubeconfig with your existing config.

Instructions on how to generate a separate kubeconfig.yaml file in the docs HERE

Going Further

I use helmfile to install a bunch of "essentials". The make sync and make apply targets will execute those respective helmfile operations via k8s/helmfile.yaml. This is how I'll be adding more "core" services to the cluster in the future. Take a look, things should work out of the box (except those hostname caveats above).

If you want to try out specific features/application setups, you can install them via:

helmfile sync --selector name=docker-registry

and name= can be replaced with any service in the helmfile.yaml that you would like to test out.

TODO: moar details to come.

Troubleshooting

  • Use RPi Imager to wipe and troubleshoot any defective cards before dding your image
    • insert SD card into system
    • select your device
    • for OS -> Misc utility images -> Bootloader -> SD Card Boot
    • select your SD card, flash it
  • SLOT THE SD CARD INTO YOUR RPi
    • VERIFY THAT ON POWERUP THE GREEN LED CONTINUOUSLY FLASHES and/or THE HDMI SCREEN IS FULLY GREEN.
    • after this, poweroff/unplug the RPi, take our your SD card
  • try dding the image again via the above

Rest of the setup is pretty much in the guide. I'm keeping extensive notes and util scripts in Makefile though to record what's been working for me. The guides are still a bit messy to follow.