The Boot Orchestration Service (BOS) is responsible for booting, configuring, and shutting down collections of nodes. This is accomplished using BOS components, such as boot orchestration session templates and sessions, as well as launching a Boot Orchestration Agent (BOA) that fulfills boot requests.
BOS users create a BOS session template via the REST API. A session template is a collection of metadata for a group of nodes and their desired boot artifacts and configuration. A BOS session can then be created by applying an action to a session template. The available actions are boot, reboot, shutdown, and configure. BOS will create a Kubernetes BOA job to apply an action. BOA coordinates with the underlying subsystems to complete the action requested. The session can be monitored to determine the status of the request.
BOS depends on each of the following services to complete its tasks:
- BOA - Handles any action type submitted to the BOS API. BOA jobs are created and launched by BOS.
- Boot Script Service (BSS) - Stores the configuration information that is used to boot each hardware component. Nodes consult BSS for their boot artifacts and boot parameters when nodes boot or reboot.
- Configuration Framework Service (CFS) - BOA launches CFS to apply configuration to the nodes in its boot sets (node personalization).
- Cray Advanced Platform Monitoring and Control (CAPMC) - Used to power on and off the nodes.
- Hardware State Manager (HSM) - Tracks the state of each node and what groups and roles nodes are included in.
BOS utilizes the Cray CLI commands. The latest API information can be found with the following command:
ncn-m001# cray bos list
[[results]]
major = "1"
minor = "0"
patch = "0"
[[results.links]]
href = "https://api-gw-service-nmn.local/apis/bos/v1"
rel = "self"
This is a forewarning of changes that will be made to the BOS API in the upcoming CSM-1.2.0 release. The following changes will be made:
- The
--template-body
option for the Cray CLIbos
command will be deprecated. - Performing a GET on the session status for a boot set (i.e. /v1/session/{session_id}/status/{boot_set_name}) currently returns a status code of 201, but instead it should return a status code of 200. This will be corrected to return 200.
The procedures in this section include the information required to boot, configure, and shut down collections of nodes with BOS.
- BOS Workflows
- BOS Session Templates
- BOS Sessions
- Manage a BOS Session
- View the Status of a BOS Session
- Limit the Scope of a BOS Session
- Configure the BOS Timeout When Booting Compute Nodes
- Check the Progress of BOS Session Operations
- Kernel Boot Parameters
- Clean Up Logs After a BOA Kubernetes Job
- Clean Up After a BOS/BOA Job is Completed or Cancelled
- Troubleshoot UAN Boot Issues
- Troubleshoot Booting Nodes with Hardware Issues
- BOS Limitations for Gigabyte BMC Hardware
- Stage Changes without BOS
- Compute Node Boot Sequence
- Healthy Compute Node Boot Process
- Node Boot Root Cause Analysis
- Compute Node Boot Issue Symptom: Duplicate Address Warnings and Declined DHCP Offers in Logs
- Compute Node Boot Issue Symptom: Node is Not Able to Download the Required Artifacts
- Compute Node Boot Issue Symptom: Message About Invalid EEPROM Checksum in Node Console or Log
- Boot Issue Symptom: Node HSN Interface Does Not Appear or Show Detected Links Detected
- Compute Node Boot Issue Symptom: Node Console or Logs Indicate that the Server Response has Timed Out
- Tools for Resolving Compute Node Boot Issues
- Troubleshoot Compute Node Boot Issues Related to Unified Extensible Firmware Interface (UEFI)
- Troubleshoot Compute Node Boot Issues Related to Dynamic Host Configuration Protocol (DHCP)
- Troubleshoot Compute Node Boot Issues Related to the Boot Script Service
- Troubleshoot Compute Node Boot Issues Related to Trivial File Transfer Protocol (TFTP)
- Troubleshoot Compute Node Boot Issues Using Kubernetes
- Log File Locations and Ports Used in Compute Node Boot Troubleshooting
- Edit the iPXE Embedded Boot Script
- Redeploy the iPXE and TFTP Services
- Upload Node Boot Information to Boot Script Service (BSS)