Deployment Manager, i.e. Scheduling Abstraction Layer (SAL) is an abstraction layer initially developed as part of the EU project Morphemic by Activeeon. Its development continued through the NebulOuS EU project. SAL aims to enhance the usability of Execution Adapter, i.e. ProActive Scheduler & Resource Manager, by providing abstraction, making it easier for users to interact with the scheduler and take advantage of its features. Seamlessly supporting REST calls and direct communication with the Execution Adapter SAL empowers users to harness the scheduler's capabilities.
SAL Code repository and documentation can be found here.
In a case of issue, please create the bug error report here. When reporting issue, for faster resolution of your problem, please include:
- the description of the scenario e.g. NebulOuS sequence diagrams which were executed
- date and time, SAL and ProActive environment
- SAL logs (especially ones inside of container)
- ProActive logs i.e.
connector-iaas.log
- ProActive job id (in a case of error during the ProActive workflow execution)
- detailed description of the action during which it happened
Note that there is additional documentation for NebulOuS development is provided here. For preset NebulOuS environment for testing and development, you can find more information on how to access SAL here, and regarding ProActive here
This section describes how the Deployment Manager and Execution Adapter support the NebulOuS scenario. It outlines the sequence of SAL operations provided to facilitate NebulOuS deployment and execution. For more information on using SAL endpoints, refer to the SAL Endpoint Documentation.
Developers can utilize the provided Postman collection to get started with the endpoints or consult the previous documentation for further details on the testing scenario.
To use SAL, you must have the Execution Adapter (ProActive) installed and properly configured. In the configuration script, it is necessary only to set
<PROACTIVE_URL>
<USERNAME>
<PASSWORD>
The rest of the configuration is automatically handled by NebulOuS (see NebulOuS SAL deployment for more details).
For additional information on setting up the SAL Kubernetes deployment script, refer to this guide. You can find details on using the endpoints here.
1.1. Connect endpoint - Establishing the connection to ProActive server.
SAL must be connected to ProActive to use any of the endpoints. If you encounter an HTTP 500
when calling endpoints, which reports a NotConnectedException
, it indicates that SAL is not connected to ProActive. You can verify this in the SAL logs (particularly those within the container).
Keep in mind that the connection to ProActive may be lost during scenario execution and may need to be reestablished.
2.1. Add cloud endpoint - Defining a cloud infrastructure.
To use this endpoint, you must specify a unique cloud_name
that has not already been registered. Note that after a SAL restart, cloud information is erased from the SAL database, though it remains in the Execution Adapter. If you use a cloud_name
that has already been registered, the infrastructure will not be updated with new information, and resources on the cloud provider may not be properly released. The only proper way to remove cloud resources is by using the Cloud deregistration endpoint.
For more information on setting up cloud providers for NebulOuS, refer to the Managing Cloud Providers documentation.
Additionally, while the infrastructure may appear registered, this does not guarantee the correctness of the configured cloud infrastructure. Once registration is complete, an asynchronous process begins to retrieve images and node candidates, and provided authentication can be validated if it is correctly configured (see how isAnyAsyncNodeCandidatesProcessesInProgress and GetCloudImages endpoints can be used for validation). Note that SSH credentials are only utilized during Cluster Deployment.
2.2. isAnyAsyncNodeCandidatesProcessesInProgress endpoint - Checking for ongoing asynchronous processes for retrieving cloud images or node candidates.
You should wait until this process returns false
, indicating that the retrieval of cloud images and node candidates from the cloud provider is complete.
2.3. GetCloudImages endpoint - Retrieving cloud images.
This endpoint can be used to verify that the cloud images and authentication settings are correct. If there is a problem with authentication, the endpoint will return an error. For issues related to incorrect credentials or insufficient permissions, consult the Execution Adapter logs. If an image retrieval problem occurs, the image will not be returned by this endpoint.
3.1. RegisterNewEdgeNode endpoint - Registering a New Edge Device.
This endpoint is used to register a new edge device. Upon successful registration, it returns the defined edge node structure, the unique edge device ID, and the node candidate ID representing this device.
Note that during this process, the device is only registered with its associated information, while validation occurs during the actual Cluster Deployment, which uses the registered edge node. To fully deregister an edge device, you must use the Edge Deregistration endpoint, which ensures proper removal from the system.
3.2. GetEdgeNodes endpoint - Retrieving All Registered Edge Devices.
This endpoint retrieves all registered edge devices, providing all information initially returned during the device registration process.
4.1. findNodeCandidates endpoint - Filtering Node Candidates Based on Deployment Requirements.
This endpoint allows you to filter node candidates using various criteria to select suitable nodes for deployment. Specify the required conditions for master or worker nodes within the cluster and store the retrieved node candidate IDs for future use.
In NebulOuS, there are only two node types:IAAS
for the cloud nodes, and EDGE
for nodes representing edge devices.
Example of Searching for Node Candidates in an OpenStack Cloud:
- Node Type: IAAS (cloud node)
- Cloud ID: Matches a specific cloud (use {{cloud_name}} to reference)
- Operating System: Ubuntu, version 22
- Region: bgo
- Hardware Specifications: 8GB RAM and 4 CPU cores
[
{
"type": "NodeTypeRequirement",
"nodeTypes": ["IAAS"]
},
{
"type": "AttributeRequirement",
"requirementClass": "cloud",
"requirementAttribute": "id",
"requirementOperator": "EQ",
"value": "{{cloud_name}}"
},
{
"type": "AttributeRequirement",
"requirementClass": "image",
"requirementAttribute": "operatingSystem.family",
"requirementOperator": "IN",
"value": "UBUNTU"
},
{
"type": "AttributeRequirement",
"requirementClass": "image",
"requirementAttribute": "name",
"requirementOperator": "INC",
"value": "22"
},
{
"type": "AttributeRequirement",
"requirementClass": "location",
"requirementAttribute": "name",
"requirementOperator": "EQ",
"value": "bgo"
},
{
"type": "AttributeRequirement",
"requirementClass": "hardware",
"requirementAttribute": "ram",
"requirementOperator": "EQ",
"value": "8192"
},
{
"type": "AttributeRequirement",
"requirementClass": "hardware",
"requirementAttribute": "cores",
"requirementOperator": "EQ",
"value": "4"
}
]
Example of Searching for a Node Candidate Representing an EDGE
Device:
[
{
"type": "NodeTypeRequirement",
"nodeTypes": ["EDGE"]
}
]
Note that for the EDGE
devices, their node candidate ID is returned during registration. In a case you target a specific edge device it is to store it during the registration process, or to introduce the unique identifiyer into device name which can be search then using attribute requirement name
in hardware
class.
4.2. getLengthOfNodeCandidates endpoint - Returns total number of existing node candidates.
5.1. DefineCluster endpoint - Defining Kubernetes cluster.
This endpoint is used to define and configure Kubernetes cluster deployments. When setting up a Kubernetes cluster using this endpoint, scripts maintained by NebulOuS developers streamline the deployment process by installing essential software components within the cluster. These scripts and other parts of the deployment workflow can be debugged and tested using ProActive workflows, enabling seamless integration and troubleshooting.
The script templates provided by SAL offer predefined structures for deployment, allowing for efficient configuration. Ensure that any required environmental variables and their values are specified in the cluster definition; these variables are maintained by the owner of the component that uses them for NebulOuS development purposes.
5.2. DeployCluster endpoint - Deploying a Kubernetes Cluster.
This endpoint initializes the cluster deployment process. Once started, you can monitor the progress of the deployment.
If the deployment fails (i.e., the SAL does not return true
), consult the SAL logs (especially ones inside of container) and ProActive logs i.e. connector-iaas.log
.
Note that deployment failures can occur due to various factors. To ensure a successful deployment and execution, confirm that selected cloud and edge nodes are available. Additionally, the information regarding SSH credentials and Execution Adapter scripts used for edge devices during Cloud Registration or Edge Device Registration is validated only at the time of deployment execution.
If the deployment succeeds and returns true
, you can track the ongoing progress and troubleshoot any issues using the Execution Adapter interface. Monitoring tools include:
- The ProActive dashboard for an overview of the entire deployment,
- The ProActive Scheduler for details on individual task execution,
- The ProActive Resource Manager to monitor resource utilization.
5.3. GetCluster endpoint - Retrieving Cluster Deployment Status.
This endpoint provides detailed information on the current status of the Kubernetes cluster deployment.
5.4. DeleteCluster endpoint - Deleting a Cluster and Undeploying Resources.
This endpoint enables the deletion of an existing Kubernetes cluster deployment. It removes all associated resources, including worker nodes and applications, effectively undeploying the cluster. Use this endpoint to fully dismantle a cluster and free up resources once the deployment is no longer needed.
6.1. ManageApplication endpoint - Managing application deployment.
This endpoint is used to deploy and manage applications within a specified Kubernetes cluster. It supports both the initial deployment of applications and the reconfiguration of application replicas, allowing you to adjust the number of replicas as needed for scaling and performance optimization.
7.1. ScaleOut endpoint - Scaling out the Cluster.
This endpoint enables dynamic expansion of the Kubernetes cluster by adding additional worker nodes as needed. Use this endpoint to increase the cluster's processing capacity and accommodate higher workloads by scaling out with new resources.
7.2. ScaleIn endpoint - Scaling In the Cluster
This endpoint allows you to scale in the Kubernetes cluster by removing specified worker nodes. Use this endpoint to decrease the cluster's size, optimize resource usage, and reduce operational costs by deallocating unneeded nodes.
7.3. LabelNode endpoint - Managing Node Labels
This endpoint allows you to manage node labels within a Kubernetes cluster, enabling you to add, modify, or remove labels on specific nodes. Use this feature to organize and categorize nodes effectively, which can aid in scheduling, resource management, and targeting specific nodes for workloads.
To scale out an application, follow these steps:
-
Add New Worker Nodes: First, use the ScaleOut endpoint to add additional worker nodes to the existing Kubernetes cluster.
-
Label the New Worker Nodes: Once the new worker nodes are successfully deployed within the cluster, apply appropriate labels to them using the LabelNode endpoint. Proper labeling is essential for organizing and targeting nodes for specific workloads.
-
Increase Application Replicas: Finally, to complete the scale-out process, adjust the number of application replicas by calling the ManageApplication endpoint. This will ensure the application takes advantage of the newly added worker nodes.
To scale in an application, follow these steps:
-
Label the Nodes for Removal: First, use the LabelNode endpoint to mark specific worker nodes as unavailable for new application replicas. This ensures that no new replicas are assigned to these nodes during the scaling process.
-
Adjust Application Replicas: Next, call the ManageApplication endpoint with a reduced number of replicas to gradually remove the application from the marked nodes.
-
Remove Worker Nodes: Finally, once the application replicas have been removed from the designated nodes, use the ScaleIn endpoint to remove the worker nodes from the cluster, optimizing resource usage and reducing operational costs.
8.1. DeleteEdgeNode endpoint - TBD
Regarding progress of this task consult here.
9.1. RemoveClouds endpoint - TBD
Regarding progress of this task consult here.
NebulOuS SAL is deployed with a chart managed at https://github.com/eu-nebulous/helm-charts/tree/main/charts/nebulous-sal NebulOuS SAL original deployment script can be found at https://github.com/ow2-proactive/scheduling-abstraction-layer/tree/master/deployment
Please bare in mind that the values in the helm chart can be overwritten in the nrec deployment definition:
cd environment: https://github.com/eu-nebulous/nrec-flux-config/blob/main/clusters/primary/nebulous-cd/helm-releases/specific-patches/nebulous-sal.yaml
prod environment: https://github.com/eu-nebulous/nrec-flux-config/blob/main/clusters/primary/nebulous-prod/helm-releases/specific-patches/nebulous-sal.yaml
test environment: https://github.com/eu-nebulous/nrec-flux-config/blob/main/clusters/primary/nebulous-test/helm-releases/specific-patches/nebulous-sal.yaml
dev environment: https://github.com/eu-NebulOuS/nrec-flux-config/blob/main/clusters/primary/NebulOuS-dev/helm-releases/specific-patches/NebulOuS-sal.yaml