Skip to content

Commit

Permalink
feat: add nvidia builds to ucore (#66)
Browse files Browse the repository at this point in the history
it's about time
  • Loading branch information
bsherman authored Oct 6, 2023
1 parent efc6ba4 commit 56b1a0f
Show file tree
Hide file tree
Showing 6 changed files with 66 additions and 9 deletions.
12 changes: 10 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,9 @@ jobs:
coreos_version:
- stable
- testing
nvidia_tag:
- "-nvidia"
- ""
zfs_tag:
- "-zfs"
- ""
Expand All @@ -253,7 +256,7 @@ jobs:
run: |
# Generate a timestamp for creating an image version history
TIMESTAMP="$(date +%Y%m%d)"
COREOS_VERSION="${{ matrix.coreos_version }}${{ matrix.zfs_tag }}"
COREOS_VERSION="${{ matrix.coreos_version }}${{ matrix.nvidia_tag }}${{ matrix.zfs_tag }}"
COMMIT_TAGS=()
BUILD_TAGS=()
Expand Down Expand Up @@ -311,6 +314,7 @@ jobs:
build-args: |
COREOS_VERSION=${{ matrix.coreos_version }}
PR_PREFIX=${{ matrix.pr_prefix }}
NVIDIA_TAG=${{ matrix.nvidia_tag }}
ZFS_TAG=${{ matrix.zfs_tag }}
labels: ${{ steps.meta.outputs.labels }}
oci: false
Expand Down Expand Up @@ -383,6 +387,9 @@ jobs:
coreos_version:
- stable
- testing
nvidia_tag:
- "-nvidia"
- ""
zfs_tag:
- "-zfs"
- ""
Expand All @@ -404,7 +411,7 @@ jobs:
run: |
# Generate a timestamp for creating an image version history
TIMESTAMP="$(date +%Y%m%d)"
COREOS_VERSION="${{ matrix.coreos_version }}${{ matrix.zfs_tag }}"
COREOS_VERSION="${{ matrix.coreos_version }}${{ matrix.nvidia_tag }}${{ matrix.zfs_tag }}"
COMMIT_TAGS=()
BUILD_TAGS=()
Expand Down Expand Up @@ -462,6 +469,7 @@ jobs:
build-args: |
COREOS_VERSION=${{ matrix.coreos_version }}
PR_PREFIX=${{ matrix.pr_prefix }}
NVIDIA_TAG=${{ matrix.nvidia_tag }}
ZFS_TAG=${{ matrix.zfs_tag }}
labels: ${{ steps.meta.outputs.labels }}
oci: false
Expand Down
31 changes: 26 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,12 @@ Suitable for running containerized workloads on either baremetal or virtual mach
- moby-engine(docker), docker-compose and podman-compose
- [tailscale](https://tailscale.com) and [wireguard-tools](https://www.wireguard.com)
- [tmux](https://github.com/tmux/tmux/wiki/Getting-Started)
- Optional ZFS versions also add:
- sanoid/syncoid dependencies - see below for details
- Optional [nvidia versions](#tag-matrix) also add:
- [nvidia driver](https://negativo17.org/nvidia-driver) - latest driver (currently version 535) built from negativo17's akmod package
- [nvidia-container-tookkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html) - latest toolkit which supports both root and rootless podman containers and CDI
- [nvidia container selinux policy](https://github.com/NVIDIA/dgx-selinux/tree/master/src/nvidia-container-selinux) - allos using `--security-opt label=type:nvidia_container_t` for some jobs (some will still need `--security-opt label=disable` as suggested by nvidia)
- Optional [ZFS versions](#tag-matrix) also add:
- [sanoid/syncoid dependencies](https://github.com/jimsalterjrs/sanoid) - [see below](#zfs) for details
- [ZFS](https://openzfs.github.io/openzfs-docs/Getting%20Started/Fedora/index.html)
- Enables staging of automatic system updates via rpm-ostreed
- Enables password based SSH auth (required for locally running cockpit web interface)
Expand Down Expand Up @@ -95,9 +99,23 @@ sanoid/syncoid is a great tool for manual and automated snapshot/transfer of ZFS

`ucore` has pre-install all the (lightweight) required dependencies (perl-Config-IniFiles perl-Data-Dumper perl-Capture-Tiny perl-Getopt-Long lzop mbuffer mhash pv), such that a user wishing to use sanoid/syncoid only need install the "sbin" files and create configuration/systemd units for it.

### NVIDIA

If you installed an image with `-nvidia` in the tag, the nvidia kernel module, basic CUDA libraries, and the nvidia-container-toolkit are all are pre-installed.

Note, this does NOT add desktop graphics services to your images, but it DOES enable your compatible nvidia GPU to be used for nvdec, nvenc, CUDA, etc. Since this is CoreOS and it's primarily intended for container workloads the [nvidia container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html) should be well understood.

Note the included driver is the [latest nvidia driver](https://github.com/negativo17/nvidia-driver/blob/master/nvidia-driver.spec) as bundled by [negativo17](https://negativo17.org/nvidia-driver/). This package was chosen over rpmfusion's due to it's granular packages which allow us to install just the minimal `nvidia-driver-cuda` packages.

#### Other NVIDIA Drivers

If you need an older (or different) driver, consider looking at the [container-toolkit-fcos driver](https://hub.docker.com/r/fifofonix/driver/). It provides pre-bundled container images with nvidia drivers for FCOS, allowing auto-build/loading of the nvidia driver IN podman, at boot, via a systemd service.

If going this path, you likely won't want to use the `ucore` `-nvidia` image, but would use the suggested systemd service. The nvidia container toolkit will still be required but can by layered easily.

### ZFS

The ZFS kernel module and tools are pre-installed, but like other services, ZFS is not pre-configured to load on default.
If you installed an image with `-zfs` in the tag (or `fedora-coreos-zfs`), the ZFS kernel module and tools are pre-installed, but like other services, ZFS is not pre-configured to load on default.

Load it with the command `modprobe zfs` and use `zfs` and `zpool` commands as desired.

Expand Down Expand Up @@ -152,10 +170,13 @@ To rebase an Fedora CoreOS machine to the latest uCore (stable):
sudo rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/IMAGE:TAG
```

#### Tag Matrix
| IMAGE | TAG |
|-|-|
| [`ucore`](#ucore) | `stable`, `testing`, `stable-zfs`, `testing-zfs` |
| [`ucore-hci`](#ucore-hci) | `stable`, `testing`, `stable-zfs`, `testing-zfs` |
| [`ucore`](#ucore) - *stable* | `stable`, `stable-nvidia`, `stable-zfs`,`stable-nvidia-zfs` |
| [`ucore`](#ucore) - *testing* | `testing`, `testing-nvidia`, `testing-zfs`, `testing-nvidia-zfs` |
| [`ucore-hci`](#ucore-hci) - *stable* | `stable`, `stable-nvidia`, `stable-zfs`,`stable-nvidia-zfs` |
| [`ucore-hci`](#ucore-hci) - *testing* | `testing`, `testing-nvidia`, `testing-zfs`, `testing-nvidia-zfs` |
| [`fedora-coreos-zfs`](#fedora-coreos-zfs) | `stable`, `testing` |


Expand Down
3 changes: 2 additions & 1 deletion hci/Containerfile
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
ARG COREOS_VERSION="${COREOS_VERSION:-stable}"
ARG IMAGE_NAME="${IMAGE_NAME:-ucore}"
ARG PR_PREFIX="${PR_PREFIX}"
ARG NVIDIA_TAG="${NVIDIA_TAG}"
ARG ZFS_TAG="${ZFS_TAG}"

FROM ghcr.io/ublue-os/${IMAGE_NAME}:${PR_PREFIX}${COREOS_VERSION}${ZFS_TAG}
FROM ghcr.io/ublue-os/${IMAGE_NAME}:${PR_PREFIX}${COREOS_VERSION}${NVIDIA_TAG}${ZFS_TAG}

ARG COREOS_VERSION="${COREOS_VERSION:-stable}"
ARG IMAGE_NAME="${IMAGE_NAME:-ucore}"
Expand Down
8 changes: 7 additions & 1 deletion main/Containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,18 @@ FROM quay.io/fedora/fedora-coreos:${COREOS_VERSION}

ARG COREOS_VERSION="${COREOS_VERSION:-stable}"
ARG IMAGE_NAME="${IMAGE_NAME:-ucore}"
# build with --build-arg NVIDA_TAG="-nvidia" to install nvidia
ARG NVIDIA_TAG="${NVIDIA_TAG}"
# build with --build-arg ZFS_TAG="-zfs" to install zfs
ARG ZFS_TAG="${ZFS_TAG}"
ARG KMOD_SRC="${KMOD_SRC:-ghcr.io/ublue-os/ucore-kmods:${COREOS_VERSION}}"

COPY --from=${KMOD_SRC} /rpms/kmods/nvidia/*.rpm /tmp/rpms/nvidia/
COPY --from=${KMOD_SRC} /rpms/kmods/zfs/*.rpm /tmp/rpms/zfs/

COPY *.sh /tmp/
COPY packages.json /tmp/packages.json

COPY --from=ghcr.io/ublue-os/ucore-kmods:${COREOS_VERSION} /rpms/kmods/zfs/*.rpm /tmp/rpms/zfs/
COPY usr /usr

RUN mkdir -p /var/lib/alternatives \
Expand Down
14 changes: 14 additions & 0 deletions main/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,20 @@ if [[ "-zfs" == "${ZFS_TAG}" ]]; then
pv
fi

## CONDITIONAL: install NVIDIA
if [[ "-nvidia" == "${NVIDIA_TAG}" ]]; then
# repo for nvidia rpms
curl -L https://negativo17.org/repos/fedora-nvidia.repo -o /etc/yum.repos.d/fedora-nvidia.repo

rpm-ostree install /tmp/rpms/nvidia/ublue-os-ucore-nvidia-*.rpm
sed -i '0,/enabled=0/{s/enabled=0/enabled=1/}' /etc/yum.repos.d/nvidia-container-toolkit.repo

rpm-ostree install \
/tmp/rpms/nvidia/kmod-nvidia-*.rpm \
nvidia-driver-cuda \
nvidia-container-toolkit
fi

## ALWAYS: install regular packages

# add tailscale repo
Expand Down
7 changes: 7 additions & 0 deletions main/post-install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ if [[ "-zfs" == "${ZFS_TAG}" ]]; then
echo "no post-install tasks for ZFS"
fi

## CONDITIONAL: post-install NVIDIA
if [[ "-nvidia" == "${NVIDIA_TAG}" ]]; then
sed -i 's@enabled=1@enabled=0@g' /etc/yum.repos.d/nvidia-container-toolkit.repo

semodule --verbose --install /usr/share/selinux/packages/nvidia-container.pp
fi


## ALWAYS: regular post-install
systemctl disable docker.socket
Expand Down

0 comments on commit 56b1a0f

Please sign in to comment.