This procedure will install CSM applications and services into the CSM Kubernetes cluster.
NOTE
Check the information in Known issues before starting this procedure to be warned about possible problems.
NOTE
: During this step, only on systems with only three worker nodes (typically Testing and Development Systems (TDS)), thecustomizations.yaml
file will be automatically edited to lower pod CPU requests for some services, in order to better facilitate scheduling on smaller systems. See the file${CSM_PATH}/tds_cpu_requests.yaml
for these settings. This file can be modified with different values (prior to executing theyapl
command below), if other settings are desired in thecustomizations.yaml
file for this system. For more information about modifyingcustomizations.yaml
and tuning for specific systems, see Post-Install Customizations.
-
(
pit#
) Install YAPL.rpm -Uvh "${CSM_PATH}"/rpm/cray/csm/sle-15sp2/x86_64/yapl-*.x86_64.rpm
-
(
pit#
) Install CSM services using YAPL.pushd /usr/share/doc/csm/install/scripts/csm_services yapl -f install.yaml execute popd
NOTE
- This command may take up to 90 minutes to complete.
- If any errors are encountered, then potential fixes should be displayed where the error occurred.
- Output is redirected to
/usr/share/doc/csm/install/scripts/csm_services/yapl.log
. To show the output in the terminal, append the--console-output execute
argument to theyapl
command. - The
yapl
command can safely be rerun. By default, it will skip any steps which were previously completed successfully. To force it to rerun all steps regardless of what was previously completed, append the--no-cache
argument to theyapl
command.
-
(
pit#
) Wait for BSS to be ready.kubectl -n services rollout status deployment cray-bss
-
(
pit#
) Retrieve an API token.export TOKEN=$(curl -k -s -S -d grant_type=client_credentials \ -d client_id=admin-client \ -d client_secret=`kubectl get secrets admin-client-auth -o jsonpath='{.data.client-secret}' | base64 -d` \ https://api-gw-service-nmn.local/keycloak/realms/shasta/protocol/openid-connect/token | jq -r '.access_token')
-
(
pit#
) Create empty boot parameters:curl -i -k -H "Authorization: Bearer ${TOKEN}" -X PUT \ https://api-gw-service-nmn.local/apis/bss/boot/v1/bootparameters \ --data '{"hosts":["Global"]}'
Example of successful output:
HTTP/2 200 content-type: application/json; charset=UTF-8 date: Mon, 27 Jun 2022 17:08:55 GMT content-length: 0 x-envoy-upstream-service-time: 7 server: istio-envoy
-
(
pit#
) Restart thespire-update-bss
job.SPIRE_JOB=$(kubectl -n spire get jobs -l app.kubernetes.io/name=spire-update-bss -o name) kubectl -n spire get "${SPIRE_JOB}" -o json | jq 'del(.spec.selector)' \ | jq 'del(.spec.template.metadata.labels."controller-uid")' \ | kubectl replace --force -f -
-
(
pit#
) Wait for thespire-update-bss
job to complete.kubectl -n spire wait "${SPIRE_JOB}" --for=condition=complete --timeout=5m
Wait at least 15 minutes to let the various Kubernetes resources initialize and start before proceeding with the rest of the install. Because there are a number of dependencies between them, some services are not expected to work immediately after the install script completes.
-
After having waited until services are healthy (run
kubectl get po -A | grep -v 'Completed\|Running'
to see which pods may still bePending
), take a manual backup of all Etcd clusters. These clusters are automatically backed up every 24 hours, but not until the clusters have been up that long. Taking a manual backup enables restoring from backup later in this install process if needed./usr/share/doc/csm/scripts/operations/etcd/take-etcd-manual-backups.sh post_install
-
The next step is to validate CSM health before redeploying the final NCN. See Validate CSM health before final NCN deployment.
The following error may occur during the Deploy CSM Applications and Services
step:
+ csi upload-sls-file --sls-file /var/www/ephemeral/prep/eniac/sls_input_file.json
2021/10/05 18:42:58 Retrieving S3 credentials ( sls-s3-credentials ) for SLS
2021/10/05 18:42:58 Unable to SLS S3 secret from k8s:secrets "sls-s3-credentials" not found
-
(
pit#
) Verify that thesls-s3-credentials
secret exists in thedefault
namespace:kubectl get secret sls-s3-credentials
Example output:
NAME TYPE DATA AGE sls-s3-credentials Opaque 7 28d
-
(
pit#
) Check for runningsonar-sync
jobs. If there are nosonar-sync
jobs, then wait for one to complete. Thesonar-sync
CronJob
is responsible for copying thesls-s3-credentials
secret from thedefault
namespace to theservices
namespace.kubectl -n services get pods -l cronjob-name=sonar-sync
Example output:
NAME READY STATUS RESTARTS AGE sonar-sync-1634322840-4fckz 0/1 Completed 0 73s sonar-sync-1634322900-pnvl6 1/1 Running 0 13s
-
(
pit#
) Verify that thesls-s3-credentials
secret now exists in theservices
namespace.kubectl -n services get secret sls-s3-credentials
Example output:
NAME TYPE DATA AGE sls-s3-credentials Opaque 7 20s
-
Running the
yapl
command again is expected to succeed.
Known potential issues along with suggested fixes are listed in Troubleshoot Nexus.