This guide shows how to deploy an multi-node OpenShift cluster with 3x masters and 2x workers using Agent-based Installer and run OpenShift's conformance test suite. It uses libvirt domains (QEMU/KVM based virtual machines) running in a Podman container to simulate bare-metal servers and auxiliary resources.
First, install git
and Podman on a bare-metal system with Debian 11 (Bullseye), CentOS Stream 8,
Fedora Linux 33, Ubuntu 22.04 LTS (Jammy Jellyfish) or newer. Ensure the system has KVM nested virtualization enabled, has enough storage to store disk images for the virtual machines and is not connected
to ip networks 192.168.157.0/24
and 192.168.158.0/24
. Then run:
git clone https://github.com/JM1/ansible-collection-jm1-cloudy.git
cd ansible-collection-jm1-cloudy/
cp -i ansible.cfg.example ansible.cfg
OpenShift requires pull secrets to authenticate with container registries Quay.io
and
registry.redhat.io
, which serve the container images for OpenShift Container Platform components. Download pull
secrets from Red Hat Cloud Console and store them in file pull-secret.txt
in repository directory ansible-collection-jm1-cloudy
.
If you want to deploy an OpenShift release image build from OpenShift CI, you also have to get a pull
secret for registry.ci.openshift.org
(guide): Ensure your GitHub.com user is a member of the
OpenShift
organization, otherwise request access here. Then request an API token, it will be like sha256~abcdefghijklmnopqrstuvwxyz01234567890abcdef
. Use this token to login
and store it in pull-secret.txt
with (replace $GITHUB_USER
and $API_TOKEN
):
podman login --authfile pull-secret.txt -u $GITHUB_USER -p $API_TOKEN registry.ci.openshift.org
NOTE: Tokens for registry.ci.openshift.org
invalidate quickly, so expect to request new tokens monthly.
Grant unprivileged user in Podman container access to pull secrets:
chmod a+r pull-secret.txt
Next, change host_vars
of Ansible host lvrt-lcl-session-srv-530-okd-abi-ha-provisioner
to read pull secrets from file
pull-secret.txt
. Open file inventory/host_vars/lvrt-lcl-session-srv-530-okd-abi-ha-provisioner.yml
and change variable
openshift_abi_pullsecret
to:
openshift_abi_pullsecret: "{{ lookup('ansible.builtin.file', '/home/cloudy/project/pull-secret.txt') }}"
Edit openshift_abi_release_image
in file inventory/host_vars/lvrt-lcl-session-srv-530-okd-abi-ha-provisioner.yml
to the
OpenShift release you want to deploy:
openshift_abi_release_image: "{{ lookup('ansible.builtin.pipe', openshift_abi_release_image_query) }}"
openshift_abi_release_image_query: |
curl -s https://mirror.openshift.com/pub/openshift-v4/amd64/clients/ocp/stable-4.14/release.txt \
| grep 'Pull From: quay.io' \
| awk -F ' ' '{print $3}'
Or:
openshift_abi_release_image: 'registry.ci.openshift.org/ocp/release:4.14'
When your corporate network blocks access to public NTP servers edit Ansible variable chrony_config
for host
lvrt-lcl-session-srv-500-okd-abi-ha-router
in file
inventory/host_vars/lvrt-lcl-session-srv-500-okd-abi-ha-router.yml
. For example, suppose your internal NTP servers are
grouped in a pool clock.company.com
, change chrony_config
to:
chrony_config:
- ansible.builtin.copy:
content: |
allow 192.168.158.0/24
# Corporate network blocks all NTP traffic except to internal NTP servers.
pool clock.company.com iburst
dest: /etc/chrony/conf.d/home.arpa.conf
mode: u=rw,g=r,o=
group: root
owner: root
Create Podman networks, volumes and containers, and attach to a container named cloudy
with:
cd containers/
sudo DEBUG=yes DEBUG_SHELL=yes ./podman-compose.sh up
Inside this container a Bash shell will be spawned for user cloudy
. This user cloudy
will be executing the libvirt
domains (QEMU/KVM based virtual machines).
Launch the first set of virtual machines with the following commands run from cloudy
's Bash shell:
ansible-playbook playbooks/site.yml --limit \
lvrt-lcl-session-srv-500-okd-abi-ha-router,\
lvrt-lcl-session-srv-501-okd-abi-ha-bmc,\
lvrt-lcl-session-srv-510-okd-abi-ha-cp0,\
lvrt-lcl-session-srv-511-okd-abi-ha-cp1,\
lvrt-lcl-session-srv-512-okd-abi-ha-cp2,\
lvrt-lcl-session-srv-520-okd-abi-ha-w0,\
lvrt-lcl-session-srv-521-okd-abi-ha-w1
The former sets up a router which provides DHCP, DNS and NTP services and internet access. It starts sushy-emulator to provide a virtual Redfish BMC to power cycle servers and mount virtual media for hardware inspection and provisioning. It will also create virtual machines for OpenShift's master nodes and worker nodes, but without an operating system and in stopped/shutdown state.
NOTE: When Ansible execution fails, try to run the Ansible playbook again.
Launch another virtual machine to run OpenShift ABI and deploy the OpenShift cluster:
ansible-playbook playbooks/site.yml \
--limit lvrt-lcl-session-srv-530-okd-abi-ha-provisioner \
--skip-tags jm1.cloudy.openshift_tests
To access the cluster when Ansible is done, connect to the virtual machine which initiated the cluster installation
(Ansible host lvrt-lcl-session-srv-530-okd-abi-ha-provisioner
):
ssh ansible@192.168.158.48
The cluster uses internal DHCP and DNS services which are not accessible from the container host. In order to connect to the virtual machine from another shell at the container host (the bare-metal system) run:
sudo podman exec -ti -u cloudy cloudy ssh ansible@192.168.158.48
From ansible
's Bash shell at lvrt-lcl-session-srv-530-okd-abi-ha-provisioner
the cluster can be accessed with:
export KUBECONFIG=/home/ansible/clusterconfigs/auth/kubeconfig
oc get nodes
oc debug node/cp0
Back at cloudy
's Bash shell inside the container, run OpenShift's conformance test suite with:
ansible-playbook playbooks/site.yml \
--limit lvrt-lcl-session-srv-530-okd-abi-ha-provisioner \
--tags jm1.cloudy.openshift_tests
Remove all virtual machines with:
# Note the .home.arpa suffix
for vm in \
lvrt-lcl-session-srv-500-okd-abi-ha-router.home.arpa \
lvrt-lcl-session-srv-501-okd-abi-ha-bmc.home.arpa \
lvrt-lcl-session-srv-510-okd-abi-ha-cp0.home.arpa \
lvrt-lcl-session-srv-511-okd-abi-ha-cp1.home.arpa \
lvrt-lcl-session-srv-512-okd-abi-ha-cp2.home.arpa \
lvrt-lcl-session-srv-520-okd-abi-ha-w0.home.arpa \
lvrt-lcl-session-srv-521-okd-abi-ha-w1.home.arpa \
lvrt-lcl-session-srv-530-okd-abi-ha-provisioner.home.arpa
do
virsh destroy "$vm"
virsh undefine --remove-all-storage --nvram "$vm"
done
Removal does not impose any order.
Exit cloudy
's Bash shell to stop the container.
NOTE: Any virtual machines still running inside the container will be killed!
Finally, remove all Podman containers, networks and volumes with:
sudo DEBUG=yes ./podman-compose.sh down