This document has troubleshooting tips when installing / using Sysbox in Docker hosts.
For troubleshooting in Kubernetes clusters, see here.
- Sysbox Installation Problems
- Docker reports Unknown Runtime error
- Unprivileged User Namespace Creation Error
- Bind Mount Permissions Error
- Failed to Setup Docker Volume Manager Error
- Failed to register with sysbox-mgr or sysbox-fs
- Docker reports failure setting up ptmx
- Docker exec fails
- Sysbox Logs
- The
/var/lib/sysbox
is not empty even though there are no containers - Kubernetes-in-Docker fails to create pods
- Core-Dump generation
When installing the Sysbox package with the apt-get install
command
(see the Installation instructions), the expected output is:
$ sudo apt-get install ./sysbox-ce_0.5.0-0.linux_amd64.deb
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'sysbox-ce' instead of './sysbox-ce_0.5.0-0.linux_amd64.deb'
The following NEW packages will be installed:
sysbox-ce
0 upgraded, 1 newly installed, 0 to remove and 178 not upgraded.
Need to get 0 B/11.2 MB of archives.
After this operation, 40.8 MB of additional disk space will be used.
Get:1 /home/rmolina/wsp/02-26-2020/sysbox/sysbox-ce_0.5.0-0.linux_amd64.deb sysbox-ce amd64 0.5.0-0.linux [11.2 MB]
Selecting previously unselected package sysbox-ce.
(Reading database ... 327292 files and directories currently installed.)
Preparing to unpack .../sysbox-ce_0.5.0-0.linux_amd64.deb ...
Unpacking sysbox-ce (0.5.0-0.linux) ...
Setting up sysbox-ce (0.5.0-0.linux) ...
Created symlink /etc/systemd/system/sysbox.service.wants/sysbox-fs.service → /lib/systemd/system/sysbox-fs.service.
Created symlink /etc/systemd/system/sysbox.service.wants/sysbox-mgr.service → /lib/systemd/system/sysbox-mgr.service.
Created symlink /etc/systemd/system/multi-user.target.wants/sysbox.service → /lib/systemd/system/sysbox.service.
If there is any missing software dependency, the 'apt-get' tool will take care of installing it accordingly during the installation process. Alternatively, you can manually execute the following instructions to install Sysbox's missing dependencies:
$ sudo apt-get update
$ sudo apt-get install -f -y
There may be other issues observed during installation. For example, in Docker environments, the Sysbox installer may complain if there are active docker containers during the installation process. In this case, proceed to execute the action suggested by the installer and re-launch the installation process again.
$ sudo apt-get install ./deb/build/amd64/ubuntu-impish/sysbox-ce_0.5.0-0.linux_amd64.deb
Reading package lists... Done
Building dependency tree
...
The Sysbox installer requires a docker service restart to configure network parameters, but it cannot proceed due to existing Docker containers. Please remove them as indicated below and re-launch the installation process. Refer to Sysbox installation documentation for details.
"docker rm $(docker ps -a -q) -f"
dpkg: error processing package sysbox-ce (--configure):
installed sysbox-ce package post-installation script subprocess returned error exit status 1
Upon successful completion of the installation task, verify that Sysbox's systemd units have been properly installed, and associated daemons are properly running:
$ systemctl list-units -t service --all | grep sysbox
sysbox-fs.service loaded active running sysbox-fs component
sysbox-mgr.service loaded active running sysbox-mgr component
sysbox.service loaded active exited Sysbox General Service
The sysbox.service is ephemeral (it exits once it launches the other sysbox services),
so the active exited
status above is expected.
When creating a system container, Docker may report the following error:
$ docker run --runtime=sysbox-runc -it ubuntu:latest
docker: Error response from daemon: Unknown runtime specified sysbox-runc.
This indicates that the Docker daemon is not aware of the Sysbox runtime.
This is likely due to one of the following reasons:
-
Docker is installed via a Ubuntu snap package.
-
Docker is installed natively, but it's daemon configuration file (
/etc/docker/daemon.json
) has an error.
For (1):
At this time, Sysbox does not support Docker installations via snap. See here for info on how to overcome this.
For (2):
The /etc/docker/daemon.json
file should have an entry for sysbox-runc
as follows:
{
"runtimes": {
"sysbox-runc": {
"path": "/usr/bin/sysbox-runc"
}
}
}
Double check that this is the case. If not, change the file and restart Docker:
$ sudo systemctl restart docker.service
NOTE: The Sysbox installer automatically does this configuration and restarts Docker. Thus this error is uncommon.
When creating a system container, Docker may report the following error:
docker run --runtime=sysbox-runc -it ubuntu:latest
docker: Error response from daemon: OCI runtime create failed: host is not configured properly: kernel is not configured to allow unprivileged users to create namespaces: /proc/sys/kernel/unprivileged_userns_clone: want 1, have 0: unknown.
This means that the host's kernel is not configured to allow unprivileged users to create user namespaces.
For Ubuntu, fix this with:
sudo sh -c "echo 1 > /proc/sys/kernel/unprivileged_userns_clone"
Note: The Sysbox package installer automatically executes this instruction, so normally there is no need to do this configuration manually.
The host's /etc/subuid
and /etc/subgid
files contain the host user-id and
group-id ranges that Sysbox assigns to the containers. These files should
have a single entry for user sysbox
that looks similar to this:
$ more /etc/subuid
sysbox:165536:65536
If for some reason this file has more than one entry for user sysbox
, you'll
see the following error when creating a container:
docker: Error response from daemon: OCI runtime create failed: error in the container spec: invalid user/group ID config: sysbox-runc requires user namespace uid mapping array have one element; found [{0 231072 65536} {65536 296608 65536}]: unknown.
When running a system container with a bind mount, you may see that
the files and directories associated with the mount have
nobody:nogroup
ownership when listed from within the container.
This typically occurs when the source of the bind mount is owned by a user on the host that is different from the user on the host to which the system container's root user maps. Recall that Sysbox containers always use the Linux user namespace and thus map the root user in the system container to a non-root user on the host.
See here for info on how to overcome this.
When creating a system container, Docker may report the following error:
docker run --runtime=sysbox-runc -it ubuntu:latest
docker: Error response from daemon: OCI runtime create failed: failed to setup docker volume manager: host dir for docker store /var/lib/sysbox/docker can't be on ..."
This means that Sysbox's /var/lib/sysbox
directory is on a
filesystem not supported by Sysbox.
This directory must be on one of the following filesystems:
- ext4
- btrfs
The same requirement applies to the /var/lib/docker
directory.
This is normally the case for vanilla Ubuntu installations, so this error is not common.
While creating a system container, Docker may report the following error:
$ docker run --runtime=sysbox-runc -it alpine
docker: Error response from daemon: OCI runtime create failed: failed to register with sysbox-mgr: failed to invoke Register via grpc: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/sysbox/sysmgr.sock: connect: connection refused": unknown.
or
docker run --runtime=sysbox-runc -it alpine
docker: Error response from daemon: OCI runtime create failed: failed to pre-register with sysbox-fs: failed to register container with sysbox-fs: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/sysbox/sysfs.sock: connect: connection refused": unknown.
This likely means that the sysbox-mgr and/or sysbox-fs daemons are not running (for some reason).
Check that these are running via systemd:
$ systemctl status sysbox-mgr
$ systemctl status sysbox-fs
If either of these services are not running, use Systemd to restart them:
$ sudo systemctl restart sysbox
Normally Systemd ensures these services are running and restarts them automatically if for some reason they stop.
The following error may be reported within a system container or any of its inner (child) containers:
# ls /proc/sys
ls: cannot access '/proc/sys': Transport endpoint is not connected
This error usually indicates that sysbox-fs daemon (and potentially sysbox-mgr too) has been restarted after the affected system container was initiated. In this scenario user is expected to recreate (stop and start) all the active Sysbox containers.
When creating a system container with Docker + Sysbox, if Docker reports an error such as:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:364: starting container process caused "process_linux.go:533: container init caused \"rootfs_linux.go:67: setting up ptmx caused \\\"remove dev/ptmx: device or resource busy\\\"\"": unknown.
It likely means the system container was launched with the Docker --privileged
flag (and this flag is not compatible with Sysbox as described
here).
You may hit this problem when doing an docker exec -it my-syscont bash
:
OCI runtime exec failed: exec failed: container_linux.go:364: starting container process caused "process_linux.go:94: executing setns process caused \"exit status 2\"": unknown
This occurs if the /proc
mount inside the system container is set to "read-only".
For example, if you launched the system container and run the following command in it:
$ mount -o remount,ro /proc
The Sysbox daemons (i.e. sysbox-fs and sysbox-mgr) will log information related
to their activities in /var/log/sysbox-fs.log
and /var/log/sysbox-mgr.log
respectively. These logs should be useful during troubleshooting exercises.
You can modify the log file location, log level, and log format. See here and here for more info.
For sysbox-runc, logging is handled as follows:
-
When running Docker + sysbox-runc, the sysbox-runc logs are actually stored in a containerd directory such as:
/run/containerd/io.containerd.runtime.v1.linux/moby/<container-id>/log.json
where
<container-id>
is the container ID returned by Docker. -
When running sysbox-runc directly, sysbox-runc will not produce any logs by default. Use the
sysbox-runc --log
option to change this.
Sysbox stores some container state under the /var/lib/sysbox
directory
(which for security reasons is only accessible to the host's root user).
When no system containers are running, this directory should be clean and look like this:
# tree /var/lib/sysbox
/var/lib/sysbox
├── containerd
├── docker
│ ├── baseVol
│ ├── cowVol
│ └── imgVol
└── kubelet
When a system container is running, this directory holds state for the container:
# tree -L 2 /var/lib/sysbox
/var/lib/sysbox
├── containerd
│ └── f29711b54e16ecc1a03cfabb16703565af56382c8f005f78e40d6e8b28b5d7d3
├── docker
│ ├── baseVol
│ │ └── f29711b54e16ecc1a03cfabb16703565af56382c8f005f78e40d6e8b28b5d7d3
│ ├── cowVol
│ └── imgVol
└── kubelet
└── f29711b54e16ecc1a03cfabb16703565af56382c8f005f78e40d6e8b28b5d7d3
If the system container is stopped and removed, the directory goes back to it's clean state:
# tree /var/lib/sysbox
/var/lib/sysbox
├── containerd
├── docker
│ ├── baseVol
│ ├── cowVol
│ └── imgVol
└── kubelet
If you have no system containers created yet /var/lib/sysbox
is not clean, it means
Sysbox is in a bad state. This is very uncommon as Sysbox is well tested.
To overcome this, you'll need to follow this procedure:
- Stop and remove all system containers (e.g., all Docker containers created with the sysbox-runc runtime).
- There is a bash script to do this here.
- Restart Sysbox:
$ sudo systemctl restart sysbox
- Verify that
/var/lib/sysbox
is back to a clean state:
# tree /var/lib/sysbox
/var/lib/sysbox
├── containerd
├── docker
│ ├── baseVol
│ ├── cowVol
│ └── imgVol
└── kubelet
When running K8s-in-Docker, if you see pods failing to deploy, we suggest starting by inspecting the kubelet log inside the K8s node where the failure occurs.
$ docker exec -it <k8s-node> bash
# journalctl -u kubelet
This log often has useful information on why the failure occurred.
One common reason for failure is that the host is lacking sufficient storage. In this case you'll see messages like these ones in the kubelet log:
Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 1284963532 bytes down to the low threshold (80%).
eviction_manager.go:168] Failed to admit pod kube-flannel-ds-amd64-6wkdk_kube-system(e3f4c428-ab15-48af-92eb-f07ce06aa4af) - node has conditions: [DiskPressure]
To overcome this, make some more storage room in your host and redeploy the pods.
If problem cannot be explained by any of the previous bullets, then it may be
helpful to obtain core-dumps for both of the Sysbox daemons (i.e. sysbox-fs
and
sysbox-mgr
). As an example, find below the instructions to generate a core-dump
for sysbox-fs
process.
-
Enable core-dump creation by making use of the
ulimit
command:$ ulimit -c unlimited
-
We make use of the
gcore
tool to create core-dumps, which is usually included as part of thegdb
package in most of the Linux distros. Installgdb
if not already present in the system:For Debian / Ubuntu distros:
$ sudo apt-get install gdb
For Fedora / CentOS / Redhat / rpm-based distros:
$ sudo yum install gdb
-
Create core-dump file. Notice that Sysbox containers will continue to operate as usual during (and after) the execution of this instruction, so no service impact is expected.
$ sudo gcore `pidof sysbox-fs` ... Saved corefile core.195835 ...
-
Compress created core file:
$ sudo tar -zcvf core.195835.tar.gz core.195835 $ ls -lrth core.195835.tar.gz -rw-r--r-- 1 root root 8.4M Apr 20 15:36 core.195835.tar.gz
-
Create a Sysbox issue and provide a link to the generated core-dump.