Skip to content

Latest commit

 

History

History
122 lines (101 loc) · 3.29 KB

how-to-use-sysctls-with-kata.md

File metadata and controls

122 lines (101 loc) · 3.29 KB

Setting Sysctls with Kata

Sysctls

In Linux, the sysctl interface allows an administrator to modify kernel parameters at runtime. Parameters are available via the /proc/sys/ virtual process file system.

The parameters include the following subsystems among others:

  • fs (file systems)
  • kernel (kernel)
  • net (networking)
  • vm (virtual memory)

To get a complete list of kernel parameters, run:

$ sudo sysctl -a

Kubernetes provide mechanisms for setting namespaced sysctls. Namespaced sysctls can be set per pod in the case of Kubernetes. The following sysctls are known to be namespaced and can be set with Kubernetes:

  • kernel.shm*
  • kernel.msg*
  • kernel.sem
  • fs.mqueue.*
  • net.*

Namespaced Sysctls:

Kata Containers supports setting namespaced sysctls with Kubernetes. All namespaced sysctls can be set in the same way as regular Linux based containers, the difference being, in the case of Kata they are set inside the guest.

Setting Namespaced Sysctls with Kubernetes:

Kubernetes considers certain sysctls as safe and others as unsafe. For detailed information about what sysctls are considered unsafe, please refer to the Kubernetes sysctl docs. For using unsafe sysctls, the cluster admin would need to allow these as:

$ kubelet --allowed-unsafe-sysctls 'kernel.msg*,net.ipv4.route.min_pmtu' ...

or using the declarative approach as:

$ cat kubeadm.yaml
apiVersion: kubeadm.k8s.io/v1alpha3
kind: InitConfiguration
nodeRegistration:
  kubeletExtraArgs:
    allowed-unsafe-sysctls: "kernel.msg*,kernel.shm.*,net.*"
...

The above YAML can then be passed to kubeadm init as:

$ sudo -E kubeadm init --config=kubeadm.yaml

Both safe and unsafe sysctls can be enabled in the same way in the Pod YAML:

apiVersion: v1
kind: Pod
metadata:
  name: sysctl-example
spec:
  securityContext:
    sysctls:
    - name: kernel.shm_rmid_forced
      value: "0"
    - name: net.ipv4.route.min_pmtu
      value: "1024"

Non-Namespaced Sysctls:

Kubernetes disallow sysctls without a namespace. The recommendation is to set them directly on the host or use a privileged container in the case of Kubernetes.

In the case of Kata, the approach of setting sysctls on the host does not work since the host sysctls have no effect on a Kata Container running inside a guest. Kata gives you the ability to set non-namespaced sysctls using a privileged container. This has the advantage that the non-namespaced sysctls are set inside the guest without having any effect on the /proc/sys values of any other pod or the host itself.

The recommended approach to do this would be to set the sysctl value in a privileged init container. In this way, the application containers do not need any elevated privileges.

apiVersion: v1
kind: Pod
metadata:
  name: busybox-kata
spec:
  runtimeClassName: kata-qemu
  securityContext:
    sysctls:
    - name: kernel.shm_rmid_forced
      value: "0"
  containers:
  - name: busybox-container
    securityContext:
      privileged: true
    image: debian
    command:
        - sleep
        - "3000"
  initContainers:
  - name: init-sys
    securityContext:
      privileged: true
    image: busybox
    command: ['sh', '-c', 'echo "64000" > /proc/sys/vm/max_map_count']