Skip to content

Commit

Permalink
feat: add generic fcos build with nvidia (#95)
Browse files Browse the repository at this point in the history
  • Loading branch information
bsherman authored Oct 19, 2023
1 parent ee9297e commit c124305
Show file tree
Hide file tree
Showing 6 changed files with 128 additions and 49 deletions.
37 changes: 26 additions & 11 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,11 @@ jobs:
echo "${{ toJSON(steps.stable.outputs) }}"
echo "${{ toJSON(steps.testing.outputs) }}"
build_fcos_zfs:
name: Build CoreOS ZFS
build_fcos:
name: Build CoreOS
runs-on: ubuntu-22.04
if: always() && !cancelled()
needs: [ build_info, coreos_versions]
needs: [build_info, coreos_versions]
permissions:
contents: read
packages: write
Expand All @@ -72,17 +72,30 @@ jobs:
fail-fast: false
matrix:
image_name:
- fedora-coreos-zfs
- fedora-coreos
coreos_version:
- stable
- testing
nvidia_tag:
- "-nvidia"
- ""
zfs_tag:
- "-zfs"
- ""
pr_prefix:
- ${{ needs.build_info.outputs.pr_prefix }}
include:
- coreos_version: stable
image_version: ${{ needs.coreos_versions.outputs.stable_version }}
- coreos_version: testing
image_version: ${{ needs.coreos_versions.outputs.testing_version }}
exclude:
- coreos_version: stable
nvidia_tag: ""
zfs_tag: ""
- coreos_version: testing
nvidia_tag: ""
zfs_tag: ""
steps:
# Checkout push-to-registry action GitHub repository
- name: Checkout Push to Registry action
Expand All @@ -94,7 +107,7 @@ jobs:
run: |
# Generate a timestamp for creating an image version history
TIMESTAMP="$(date +%Y%m%d)"
COREOS_VERSION="${{ matrix.coreos_version }}"
COREOS_VERSION="${{ matrix.coreos_version }}${{ matrix.nvidia_tag }}${{ matrix.zfs_tag }}"
COMMIT_TAGS=()
BUILD_TAGS=()
Expand Down Expand Up @@ -134,7 +147,7 @@ jobs:
labels: |
io.artifacthub.package.logo-url=https://avatars.githubusercontent.com/u/120078124?s=200&v=4
io.artifacthub.package.readme-url=https://raw.githubusercontent.com/ublue-os/ucore/main/README.md
org.opencontainers.image.description=An OCI image of Fedora CoreOS with ZFS pre-installed
org.opencontainers.image.description=An OCI image of Fedora CoreOS with NVIDIA and/or ZFS pre-installed
org.opencontainers.image.title=${{ matrix.image_name }}
org.opencontainers.image.version=${{ matrix.image_version }}
Expand All @@ -144,14 +157,16 @@ jobs:
uses: redhat-actions/buildah-build@v2
with:
containerfiles: |
./fedora-coreos-zfs/Containerfile
context: ./fedora-coreos-zfs
./fedora-coreos/Containerfile
context: ./fedora-coreos
image: ${{ matrix.image_name }}
tags: |
${{ steps.generate-tags.outputs.alias_tags }}
build-args: |
COREOS_VERSION=${{ matrix.coreos_version }}
PR_PREFIX=${{ matrix.pr_prefix }}
NVIDIA_TAG=${{ matrix.nvidia_tag }}
ZFS_TAG=${{ matrix.zfs_tag }}
labels: ${{ steps.meta.outputs.labels }}
oci: false

Expand Down Expand Up @@ -210,7 +225,7 @@ jobs:
name: Build uCore
runs-on: ubuntu-22.04
if: always() && !cancelled()
needs: [ build_info, coreos_versions]
needs: [build_info, coreos_versions]
permissions:
contents: read
packages: write
Expand Down Expand Up @@ -365,7 +380,7 @@ jobs:
name: Build HCI
runs-on: ubuntu-22.04
if: always() && !cancelled()
needs: [ build_info, build_main, coreos_versions]
needs: [build_info, build_main, coreos_versions]
permissions:
contents: read
packages: write
Expand Down Expand Up @@ -519,7 +534,7 @@ jobs:
check:
name: Check all builds successful
runs-on: ubuntu-latest
needs: [build_fcos_zfs, build_main, build_hci]
needs: [build_fcos, build_main, build_hci]
steps:
- name: Exit
shell: bash
Expand Down
31 changes: 20 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,21 @@ WARNING: This image has **not** been heavily tested, though the underlying compo

## Images & Features

### `fedora-coreos`

**NOTE: formerly named `fedora-coreos-zfs`, that version of the image did not offer the nvidia option. Please update with `rpm-ostree rebase`.**

A generic [Fedora CoreOS image](https://quay.io/repository/fedora/fedora-coreos?tab=tags) image with choice of add-on kernel modules:

- [nvidia versions](#tag-matrix) add:
- [nvidia driver](https://github.com/ublue-os/ucore-kmods) - latest driver (currently version 535) built from negativo17's akmod package
- [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html) - latest toolkit which supports both root and rootless podman containers and CDI
- [nvidia container selinux policy](https://github.com/NVIDIA/dgx-selinux/tree/master/src/nvidia-container-selinux) - allows using `--security-opt label=type:nvidia_container_t` for some jobs (some will still need `--security-opt label=disable` as suggested by nvidia)
- [ZFS versions](#tag-matrix) add:
- [ZFS driver](https://github.com/ublue-os/ucore-kmods) - latest driver (currently pinned to 2.1.x series)

*NOTE: currently, zincati fails to start on systems with OCI based deployments (like uCore). Upstream efforts are active to correct this.*

### `ucore`

Suitable for running containerized workloads on either baremetal or virtual machines, this image tries to stay lightweight but functional for multiple use cases, including that of a storage server (NAS).
Expand All @@ -27,17 +42,16 @@ Suitable for running containerized workloads on either baremetal or virtual mach
- [tailscale](https://tailscale.com) and [wireguard-tools](https://www.wireguard.com)
- [tmux](https://github.com/tmux/tmux/wiki/Getting-Started)
- udev rules enabling full functionality on some [Realtek 2.5Gbit USB Ethernet](https://github.com/wget/realtek-r8152-linux/) devices
- Optional [nvidia versions](#tag-matrix) also add:
- Optional [nvidia versions](#tag-matrix) add:
- [nvidia driver](https://github.com/ublue-os/ucore-kmods) - latest driver (currently version 535) built from negativo17's akmod package
- [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html) - latest toolkit which supports both root and rootless podman containers and CDI
- [nvidia container selinux policy](https://github.com/NVIDIA/dgx-selinux/tree/master/src/nvidia-container-selinux) - allows using `--security-opt label=type:nvidia_container_t` for some jobs (some will still need `--security-opt label=disable` as suggested by nvidia)
- Optional [ZFS versions](#tag-matrix) also add:
- Optional [ZFS versions](#tag-matrix) add:
- [sanoid/syncoid dependencies](https://github.com/jimsalterjrs/sanoid) - [see below](#zfs) for details
- [zfs driver](https://github.com/ublue-os/ucore-kmods) - latest driver (currently pinned to 2.1.x series)
- [ZFS driver](https://github.com/ublue-os/ucore-kmods) - latest driver (currently pinned to 2.1.x series)
- Enables staging of automatic system updates via rpm-ostreed
- Enables password based SSH auth (required for locally running cockpit web interface)
- Disables Zincati auto upgrade/reboot service
- *NOTE: currently, zincati fails to start on systems with OCI based deployments (like uCore). Upstream efforts are active to correct this.*

Note: per [cockpit instructions](https://cockpit-project.org/running.html#coreos) the cockpit-ws RPM is **not** installed, rather it is provided as a pre-defined systemd service which runs a podman container.

Expand All @@ -55,12 +69,6 @@ Hyper-Coverged Infrastructure(HCI) refers to storage and virtualization in one p

Note: Fedora now uses `DefaultTimeoutStop=45s` for systemd services which could cause `libvirtd` to quit before shutting down slow VMs. Consider adding `TimeoutStopSec=120s` as an override for `libvirtd.service` if needed.

### `fedora-coreos-zfs`

- A generic [Fedora CoreOS image](https://quay.io/repository/fedora/fedora-coreos?tab=tags) image
- Adds [ZFS](https://openzfs.github.io/openzfs-docs/Getting%20Started/Fedora/index.html) from the [ucore-kmods image](https://github.com/ublue-os/ucore-kmods)
- Does NOT add sanoid/syncoid dependencies as mentioned above in `ucore` features list

## Tips and Tricks

### Immutability and Podman
Expand Down Expand Up @@ -180,11 +188,12 @@ sudo rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/IMAGE:TAG
#### Tag Matrix
| IMAGE | TAG |
|-|-|
| [`fedora-coreos`](#fedora-coreos) - *stable* | `stable-nvidia`, `stable-zfs`,`stable-nvidia-zfs` |
| [`fedora-coreos`](#fedora-coreos) - *testing* | `testing-nvidia`, `testing-zfs`, `testing-nvidia-zfs` |
| [`ucore`](#ucore) - *stable* | `stable`, `stable-nvidia`, `stable-zfs`,`stable-nvidia-zfs` |
| [`ucore`](#ucore) - *testing* | `testing`, `testing-nvidia`, `testing-zfs`, `testing-nvidia-zfs` |
| [`ucore-hci`](#ucore-hci) - *stable* | `stable`, `stable-nvidia`, `stable-zfs`,`stable-nvidia-zfs` |
| [`ucore-hci`](#ucore-hci) - *testing* | `testing`, `testing-nvidia`, `testing-zfs`, `testing-nvidia-zfs` |
| [`fedora-coreos-zfs`](#fedora-coreos-zfs) | `stable`, `testing` |



Expand Down
27 changes: 0 additions & 27 deletions fedora-coreos-zfs/Containerfile

This file was deleted.

25 changes: 25 additions & 0 deletions fedora-coreos/Containerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
ARG COREOS_VERSION="${COREOS_VERSION:-stable}"

FROM quay.io/fedora/fedora-coreos:${COREOS_VERSION}

ARG COREOS_VERSION="${COREOS_VERSION:-stable}"
# build with --build-arg NVIDA_TAG="-nvidia" to install nvidia
ARG NVIDIA_TAG="${NVIDIA_TAG}"
# build with --build-arg ZFS_TAG="-zfs" to install zfs
ARG ZFS_TAG="${ZFS_TAG}"
ARG KMOD_SRC="${KMOD_SRC:-ghcr.io/ublue-os/ucore-kmods:${COREOS_VERSION}}"

COPY --from=${KMOD_SRC} /rpms/kmods/nvidia/*.rpm /tmp/rpms/nvidia/
COPY --from=${KMOD_SRC} /rpms/kmods/zfs/*.rpm /tmp/rpms/zfs/

COPY *.sh /tmp/

RUN mkdir -p /var/lib/alternatives \
&& /tmp/install.sh \
&& /tmp/post-install.sh \
&& mv /var/lib/alternatives /staged-alternatives \
&& rm -fr /tmp/* /var/* \
&& ostree container commit \
&& mkdir -p /var/lib && mv /staged-alternatives /var/lib/alternatives \
&& mkdir -p /tmp /var/tmp \
&& chmod -R 1777 /tmp /var/tmp
42 changes: 42 additions & 0 deletions fedora-coreos/install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/sh

set -ouex pipefail

RELEASE="$(rpm -E %fedora)"

#### PREPARE
# enable testing repos if not enabled on testing stream
if [[ "testing" == "${COREOS_VERSION}" ]]; then
for REPO in $(ls /etc/yum.repos.d/fedora-updates-testing{,-modular}.repo); do
if [[ "$(grep enabled=1 ${REPO} > /dev/null; echo $?)" == "1" ]]; then
echo "enabling $REPO" &&
sed -i '0,/enabled=0/{s/enabled=0/enabled=1/}' ${REPO}
fi
done
fi

# always disable cisco-open264 repo
sed -i 's@enabled=1@enabled=0@g' /etc/yum.repos.d/fedora-cisco-openh264.repo

#### INSTALL
# inspect to see what RPMS we copied in
find /tmp/rpms/

## CONDITIONAL: install ZFS (and sanoid deps)
if [[ "-zfs" == "${ZFS_TAG}" ]]; then
rpm-ostree install pv /tmp/rpms/zfs/*.rpm
fi

## CONDITIONAL: install NVIDIA
if [[ "-nvidia" == "${NVIDIA_TAG}" ]]; then
# repo for nvidia rpms
curl -L https://negativo17.org/repos/fedora-nvidia.repo -o /etc/yum.repos.d/fedora-nvidia.repo

rpm-ostree install /tmp/rpms/nvidia/ublue-os-ucore-nvidia-*.rpm
sed -i '0,/enabled=0/{s/enabled=0/enabled=1/}' /etc/yum.repos.d/nvidia-container-toolkit.repo

rpm-ostree install \
/tmp/rpms/nvidia/kmod-nvidia-*.rpm \
nvidia-driver-cuda \
nvidia-container-toolkit
fi
15 changes: 15 additions & 0 deletions fedora-coreos/post-install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/sh

set -ouex pipefail

## CONDITIONAL: post-install ZFS
if [[ "-zfs" == "${ZFS_TAG}" ]]; then
echo "no post-install tasks for ZFS"
fi

## CONDITIONAL: post-install NVIDIA
if [[ "-nvidia" == "${NVIDIA_TAG}" ]]; then
sed -i 's@enabled=1@enabled=0@g' /etc/yum.repos.d/nvidia-container-toolkit.repo

semodule --verbose --install /usr/share/selinux/packages/nvidia-container.pp
fi

0 comments on commit c124305

Please sign in to comment.