Skip to content

Commit

Permalink
Merge branch 'release/1.6' into feature/CRAYSAT-1740
Browse files Browse the repository at this point in the history
  • Loading branch information
haasken-hpe committed Jul 25, 2024
2 parents 7fc91a8 + cfd668d commit 5f8c7af
Show file tree
Hide file tree
Showing 58 changed files with 2,699 additions and 501 deletions.
1 change: 0 additions & 1 deletion api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,3 @@
* [Hardware State Manager API v2](./smd.md)
* [Cray STS Token Generator v1](./sts.md)
* [TAPMS Tenant Status API v1](./tapms-operator.md)
* [User Access Service v1](./uas-mgr.md)
56 changes: 0 additions & 56 deletions api/sls.md
Original file line number Diff line number Diff line change
Expand Up @@ -329,14 +329,6 @@ Status Code **200**

*xor*

|Name|Type|Required|Restrictions|Description|
|---|---|---|---|---|
|»» *anonymous*|[hardware_comptype_virtual_node](#schemahardware_comptype_virtual_node)|false|none|none|
|»»» NodeType|string|true|none|The role type assigned to this node.|
|»»» nid|integer|false|none|none|

*xor*

|Name|Type|Required|Restrictions|Description|
|---|---|---|---|---|
|»» *anonymous*|[hardware_ip_and_creds_optional](#schemahardware_ip_and_creds_optional)|false|none|none|
Expand All @@ -357,7 +349,6 @@ Status Code **200**
|NodeType|Application|
|NodeType|Storage|
|NodeType|Management|
|NodeType|Management|

<aside class="warning">
To perform this operation, you must be authenticated by means of one of the following methods:
Expand Down Expand Up @@ -1027,14 +1018,6 @@ Status Code **200**

*xor*

|Name|Type|Required|Restrictions|Description|
|---|---|---|---|---|
|»» *anonymous*|[hardware_comptype_virtual_node](#schemahardware_comptype_virtual_node)|false|none|none|
|»»» NodeType|string|true|none|The role type assigned to this node.|
|»»» nid|integer|false|none|none|

*xor*

|Name|Type|Required|Restrictions|Description|
|---|---|---|---|---|
|»» *anonymous*|[hardware_ip_and_creds_optional](#schemahardware_ip_and_creds_optional)|false|none|none|
Expand All @@ -1055,7 +1038,6 @@ Status Code **200**
|NodeType|Application|
|NodeType|Storage|
|NodeType|Management|
|NodeType|Management|

<aside class="warning">
To perform this operation, you must be authenticated by means of one of the following methods:
Expand Down Expand Up @@ -1620,9 +1602,6 @@ sls_dump:
|»»»»» *anonymous*|body|[hardware_comptype_node](#schemahardware_comptype_node)|false|none|
|»»»»»» NodeType|body|string|true|The role type assigned to this node.|
|»»»»»» nid|body|integer|false|none|
|»»»»» *anonymous*|body|[hardware_comptype_virtual_node](#schemahardware_comptype_virtual_node)|false|none|
|»»»»»» NodeType|body|string|true|The role type assigned to this node.|
|»»»»»» nid|body|integer|false|none|
|»»»»» *anonymous*|body|[hardware_ip_and_creds_optional](#schemahardware_ip_and_creds_optional)|false|none|
|»»»»»» IP6addr|body|string|false|The ipv6 address that should be assigned to this BMC, or "DHCPv6". If omitted, "DHCPv6" is assumed.|
|»»»»»» IP4addr|body|string|false|The ipv4 address that should be assigned to this BMC, or "DHCPv4". If omitted, "DHCPv4" is assumed.|
Expand Down Expand Up @@ -1668,7 +1647,6 @@ sls_dump:
|»»»»»» NodeType|Application|
|»»»»»» NodeType|Storage|
|»»»»»» NodeType|Management|
|»»»»»» NodeType|Management|

<h3 id="post__loadstate-responses">Responses</h3>

Expand Down Expand Up @@ -3512,34 +3490,6 @@ The human-readable time this object was last created or updated.
|NodeType|Storage|
|NodeType|Management|

<h2 id="tocS_hardware_comptype_virtual_node">hardware_comptype_virtual_node</h2>
<!-- backwards compatibility -->
<a id="schemahardware_comptype_virtual_node"></a>
<a id="schema_hardware_comptype_virtual_node"></a>
<a id="tocShardware_comptype_virtual_node"></a>
<a id="tocshardware_comptype_virtual_node"></a>

```json
{
"NodeType": "Management",
"nid": "2"
}

```

### Properties

|Name|Type|Required|Restrictions|Description|
|---|---|---|---|---|
|NodeType|string|true|none|The role type assigned to this node.|
|nid|integer|false|none|none|

#### Enumerated Values

|Property|Value|
|---|---|
|NodeType|Management|

<h2 id="tocS_hardware_comptype_nodecard">hardware_comptype_nodecard</h2>
<!-- backwards compatibility -->
<a id="schemahardware_comptype_nodecard"></a>
Expand Down Expand Up @@ -3647,12 +3597,6 @@ xor

xor

|Name|Type|Required|Restrictions|Description|
|---|---|---|---|---|
|*anonymous*|[hardware_comptype_virtual_node](#schemahardware_comptype_virtual_node)|false|none|none|

xor

|Name|Type|Required|Restrictions|Description|
|---|---|---|---|---|
|*anonymous*|[hardware_comptype_nodecard](#schemahardware_comptype_nodecard)|false|none|none|
Expand Down
78 changes: 20 additions & 58 deletions install/deploy_final_non-compute_node.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,70 +190,34 @@ The steps in this section load hand-off data before a later procedure reboots th

It is important to backup some files from `ncn-m001` before it is rebooted.

1. (`pit#`) Set up passwordless SSH **to** the PIT node from `ncn-m002`.

> The `ssh` command below may prompt for the NCN root password.

```bash
ssh ncn-m002 cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys &&
chmod 600 /root/.ssh/authorized_keys
```

1. (`pit#`) Stop the typescript session.

```bash
exit
```

1. (`pit#`) Preserve logs and configuration files if desired.
1. (`pit#`) Create PIT backup and copy it off.

The following commands create a `tar` archive of select files on the PIT node. This archive is located
in a directory that will be backed up in the next steps.
This script creates a backup of select files on the PIT node, copying them to both
another master NCN and to S3.

> The script below may prompt for the NCN root password.

```bash
mkdir -pv "${PITDATA}"/prep/logs &&
ls -d \
/etc/dnsmasq.d \
/etc/os-release \
/etc/sysconfig/network \
/opt/cray/tests/cmsdev.log \
/opt/cray/tests/install/logs \
/opt/cray/tests/logs \
/root/.canu \
/root/.config/cray/logs \
/root/csm*.{log,txt} \
/tmp/*.log \
/usr/share/doc/csm/install/scripts/csm_services/yapl.log \
/var/log/conman \
/var/log/zypper.log 2>/dev/null |
sed 's_^/__' |
xargs tar -C / -czvf "${PITDATA}/prep/logs/pit-backup-$(date +%Y-%m-%d_%H-%M-%S).tgz"
/usr/share/doc/csm/install/scripts/backup-pit-data.sh
```

1. (`pit#`) Copy some of the installation files to `ncn-m002`.

These files will be copied back to `ncn-m001` after the PIT node is rebooted.
Ensure that the script output ends with `COMPLETED`, indicating that the procedure was successful.

```bash
ssh ncn-m002 \
"mkdir -pv /metal/bootstrap
rsync -e 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null' -rltD -P --delete pit.nmn:'${PITDATA}'/prep /metal/bootstrap/
rsync -e 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null' -rltD -P --delete pit.nmn:'${CSM_PATH}'/images/pre-install-toolkit/pre-install-toolkit*.iso /metal/bootstrap/"
```
1. In the output of the script run in the previous step, note the value it reports for the `first-master-hostname`.
This will be needed in a later step.

1. (`pit#`) Upload install files to S3 in the cluster.
Example output excerpt:

```bash
PITBackupDateTime=$(date +%Y-%m-%d_%H-%M-%S)
tar -czvf "${PITDATA}/PitPrepIsoConfigsBackup-${PITBackupDateTime}.tgz" "${PITDATA}/prep" "${PITDATA}/configs" "${CSM_PATH}/images/pre-install-toolkit/pre-install-toolkit"*.iso &&
cray artifacts create config-data \
"PitPrepIsoConfigsBackup-${PITBackupDateTime}.tgz" \
"${PITDATA}/PitPrepIsoConfigsBackup-${PITBackupDateTime}.tgz" &&
rm -v "${PITDATA}/PitPrepIsoConfigsBackup-${PITBackupDateTime}.tgz" && echo COMPLETED
```text
first-master-hostname: ncn-m002
```

Ensure that the previous command chain output ends with `COMPLETED`, indicating that the procedure was successful.

## 4. Reboot

1. (`external#`) Open a serial console to the PIT node, if one is not already open.
Expand Down Expand Up @@ -327,13 +291,15 @@ It is important to backup some files from `ncn-m001` before it is rebooted.
1. (`ncn-m001#`) Restore and verify the site link.

Restore networking files from the manual backup taken during the
[Backup](#33-backup) step.
[Backup](#33-backup) step. Set the `FM` variable to the `first-master-hostname`
value noted in that section.

> **`NOTE`** Do NOT change any default NCN hostname; otherwise, unexpected deployment or upgrade errors may happen.

```bash
SYSTEM_NAME=eniac
rsync "ncn-m002:/metal/bootstrap/prep/${SYSTEM_NAME}/pit-files/ifcfg-lan0" /etc/sysconfig/network/ && \
FM=ncn-m002
rsync "${FM}:/metal/bootstrap/prep/${SYSTEM_NAME}/pit-files/ifcfg-lan0" /etc/sysconfig/network/ && \
wicked ifreload lan0 && \
wicked ifstatus lan0
```
Expand Down Expand Up @@ -378,19 +344,15 @@ It is important to backup some files from `ncn-m001` before it is rebooted.
exit
```

1. (`ncn-m002#`) Copy install files back to `ncn-m001`.
1. If `ncn-m002` is not the `first-master-hostname` noted in the [Backup](#33-backup) step, then SSH to that node.

```bash
rsync -rltDv -P /metal/bootstrap ncn-m001:/metal/ && rm -rfv /metal/bootstrap
```

1. (`ncn-m002#`) Log out of `ncn-m002`.
1. (`first-master-hostname#`) Copy install files back to `ncn-m001`.

```bash
exit
rsync -rltDv -P /metal/bootstrap ncn-m001:/metal/ && rm -rfv /metal/bootstrap
```

1. Log in to `ncn-m001`.
1. Log out of the other nodes and log in to `ncn-m001`.

SSH back into `ncn-m001` or log in at the console.

Expand Down
139 changes: 139 additions & 0 deletions install/scripts/backup-pit-data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
#!/bin/bash
#
# MIT License
#
# (C) Copyright 2024 Hewlett Packard Enterprise Development LP
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
# ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.
#

set -euo pipefail

# This means that something like /tmp/*.log will evaluate to an empty string if no files fit the pattern
shopt -s nullglob

# This script is a replacement for the steps that were previously done manually
# during the "Deploy Final NCN" step of CSM installs.

function err_exit {
echo "ERROR: $*" >&2
exit 1
}

function dir_exists {
[[ -e $1 ]] || err_exit "Directory '$1' does not exist"
[[ -d $1 ]] || err_exit "'$1' exists but is not a directory"
}

function run_cmd {
echo "# $*"
"$@" || err_exit "Command failed with exit code $?: $*"
}

# Ensure that PITDATA and CSM_PATH variables are set
[[ -v PITDATA && -n ${PITDATA} ]] || err_exit "PITDATA variable must be set"
[[ -v CSM_PATH && -n ${CSM_PATH} ]] || err_exit "CSM_PATH variable must be set"

# Make sure that expected directories exist and are actually directories
for DIR in "${PITDATA}" "${PITDATA}/prep" "${PITDATA}/configs" "${CSM_PATH}" \
"${CSM_PATH}/images" "${CSM_PATH}/images/pre-install-toolkit"; do

dir_exists "${DIR}"

done

PIT_ISO_DIR="${CSM_PATH}/images/pre-install-toolkit"

# Make sure that expected PIT iso file can be found
compgen -G "${PIT_ISO_DIR}/pre-install-toolkit*.iso" > /dev/null 2>&1 || err_exit "PIT ISO file (${PIT_ISO_DIR}/pre-install-toolkit*.iso) not found"

# Make sure we can figure out the first master node
DATA_JSON="${PITDATA}/configs/data.json"
[[ -e ${DATA_JSON} ]] || err_exit "File does not exist: '${DATA_JSON}'"
[[ -f ${DATA_JSON} ]] || err_exit "Exists but is not a regular file: '${DATA_JSON}'"
[[ -s ${DATA_JSON} ]] || err_exit "File exists but is empty: '${DATA_JSON}'"

FM=$(jq -r '."Global"."meta-data"."first-master-hostname"' < "${DATA_JSON}") || err_exit "Error getting first-master-hostname from '${DATA_JSON}'"
[[ -n ${FM} ]] || err_exit "No first-master-hostname found in '${DATA_JSON}'"
echo "first-master-hostname: $FM"

# Set up passwordless SSH **to** the PIT node from the first-master node
echo "If prompted, enter the $(whoami) password for ${FM}"
ssh "${FM}" cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys || err_exit "Unable to read ${FM}:/root/.ssh/id_rsa.pub and/or write to /root/.ssh/authorized_keys"
run_cmd chmod 600 /root/.ssh/authorized_keys

# Okay, everything seems good
run_cmd mkdir -pv "${PITDATA}"/prep/logs

# Because some of these files are log files that are changing during this procedure, any call to directly
# tar them may result in the tar command failing. Thus, we first copy all of these files into a temporary
# directory, and from there we create the tar archive

TEMPDIR=$(mktemp -d) || err_exit "Command failed: mktemp -d"

echo "Copying selected files to temporary directory"

for BACKUP_TARGET in \
/etc/conman.conf \
/etc/dnsmasq.d \
/etc/os-release \
/etc/sysconfig/network \
/opt/cray/tests/cmsdev.log \
/opt/cray/tests/install/logs \
/opt/cray/tests/logs \
/root/.bash_history \
/root/.canu \
/root/.config/cray/logs \
/root/csm*.{log,txt} \
/tmp/*.log \
/usr/share/doc/csm/install/scripts/csm_services/yapl.log \
/var/log; do

[[ -e ${BACKUP_TARGET} ]] || continue
DIRNAME=$(dirname "${BACKUP_TARGET}")
TARG_DIR="${TEMPDIR}${DIRNAME}"
run_cmd mkdir -pv "${TARG_DIR}"
run_cmd cp -pr "${BACKUP_TARGET}" "${TARG_DIR}"

done

echo "Creating PIT backup tarfile"

pushd "${TEMPDIR}"
run_cmd tar -czvf "${PITDATA}/prep/logs/pit-backup-$(date +%Y-%m-%d_%H-%M-%S).tgz" --remove-files *
popd
run_cmd rmdir -v "${TEMPDIR}"

echo "Copying files to ${FM}"
ssh "${FM}" \
"mkdir -pv /metal/bootstrap &&
rsync -e 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null' -rltD -P --delete pit.nmn:'${PITDATA}'/prep /metal/bootstrap/ &&
rsync -e 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null' -rltD -P --delete pit.nmn:'${PIT_ISO_DIR}'/pre-install-toolkit*.iso /metal/bootstrap/"

PITBackupDateTime=$(date +%Y-%m-%d_%H-%M-%S)
run_cmd tar -czvf "${PITDATA}/PitPrepIsoConfigsBackup-${PITBackupDateTime}.tgz" "${PITDATA}/prep" "${PITDATA}/configs" "${PIT_ISO_DIR}/pre-install-toolkit"*.iso
run_cmd cray artifacts create config-data \
"PitPrepIsoConfigsBackup-${PITBackupDateTime}.tgz" \
"${PITDATA}/PitPrepIsoConfigsBackup-${PITBackupDateTime}.tgz"
run_cmd rm -v "${PITDATA}/PitPrepIsoConfigsBackup-${PITBackupDateTime}.tgz"

# Since the installer needs to take note of this value, we will display it again here at the end of the script
echo "first-master-hostname: $FM"

echo COMPLETED
Loading

0 comments on commit 5f8c7af

Please sign in to comment.