Replies: 2 comments
-
Questions arising from team discussion around this feature that we need to resolve: Initial assumption: autoscaled jobs are not preempt-able by definition, yet... Once the autoscaled jobs have reached the max configured limit, what happens? Do they then become preempt-able again? What else happens when autoscale limits have been reached? How do we decide what workload is autoscable? Via metadata? Algorithmically by priority? need to configure limits: by multiplier with a max? Priority, how do we infer it? from queue? from podspec? How do we downscale when autoscaled jobs have been completed? |
Beta Was this translation helpful? Give feedback.
-
I think this goes well with our idea of a toggleable scheduler algorithm. In general I think this could be useful for autoscaling and for scheduling to clusters that have a queueing solution enabled (YuniKorn/Kueue). |
Beta Was this translation helpful? Give feedback.
-
Problem Statement
Currently, Armada Kubernetes Batch Scheduler assumes a fixed pool of resources such as nodes, CPU, memory, and GPU for its scheduling algorithms. However, this approach poses challenges when working with Kubernetes Autoscalers like Cluster Autoscaler or cloud vendor-specific autoscalers. Armada Server denies submission requests for workloads that cannot be scheduled due to limited resources.
Proposed Solution: Implement Overscheduling
To address this issue, we propose implementing the concept of overscheduling in Armada. Overscheduling involves submitting workloads to Executor clusters, even if there may not be enough resources initially available. This will trigger an autoscaler to add more nodes to the Executor cluster, enabling successful scheduling of the pending pods.
Implementation Details
The implementation of overscheduling in Armada Kubernetes Batch Scheduler would involve the following steps:
Configuration: Introduce a new configuration parameter in Armada Server to define the overscheduling percentage. This parameter will determine the amount of additional workload that can be submitted, even if resources are temporarily insufficient.
Submission to Executor Cluster: When Armada Server receives a workload submission request, it will check the available resources in the Executor cluster. If the current resources are below the overscheduling threshold, Armada Server will submit the workload to the Executor Kubernetes cluster.
Pending Pods: Upon submission, the pods will enter the pending state in the Executor Kubernetes cluster due to resource scarcity. This is an expected behavior in overscheduling.
Autoscaling Trigger: The presence of pending pods will trigger the autoscaler (e.g., Cluster Autoscaler) associated with the Executor Kubernetes cluster. The autoscaler will evaluate the pending workload and initiate the scaling process by adding more nodes to the Executor cluster.
Resource Availability and Scheduling: As the autoscaler provisions additional nodes, the resources will become available in the Executor cluster. Kubernetes scheduler will then schedule the pending pods onto the newly added nodes, utilizing the now available resources.
Benefits
By implementing overscheduling in Armada Kubernetes Batch Scheduler, the following benefits can be realized:
Improved Resource Utilization: Overscheduling allows Armada to make better use of resources by utilizing them to their maximum capacity. It avoids resource wastage during periods of lower workload.
Seamless Integration with Autoscalers: Armada can work seamlessly with Kubernetes Cluster Autoscaler and cloud vendor-specific autoscalers. Autoscalers can detect pending pods and automatically scale the Executor cluster to meet the additional resource requirements.
Enhanced Scalability: Overscheduling ensures that Armada can handle spikes in workload demand by dynamically scaling the Executor cluster. It allows for better responsiveness to workload fluctuations.
Flexibility and Customization: The overscheduling percentage can be configured based on specific requirements and workload characteristics. Armada users can fine-tune this parameter to achieve the desired balance between overscheduling and resource availability.
Technical Details
Introduce a new parameter in Executor called
overscheduleMultipler
which allows overscheduling up to largest node resources x the overscheduling multiplier.Edit Executor to report back does it support overscheduling and how much.
Edit scheduler logic to allow overscheduling if an executor cluster supports it.
Example
Let's assume the following 30-day analysis of workload distribution.
On average, 350 workloads per day were scheduled using a cluster of 5 nodes, which roughly translates that one node can service 70 workloads. Let's further assume that for the 20 days we had 20 workloads/day and for the remaining 10 days we had 1000 workloads/day.
Without autoscaling, for 20 days, the cluster remained underutilized with only 20 workloads per day and 4 extra nodes running, while for the remaining 10 days, the workload surged to 1000 workloads per day, exceeding the cluster's capacity by lacking additional 10 nodes.
For 20 days we payed for more resources than needed, and for 10 days our cluster could not handle the workload frequency.
Scenarios
Let's assume our Kubernetes cluster consists of a couple of
t3.small
(2 vCPU, 2GB memory) instances, and that autoscaler is configured to provide instance typest3.small
,t3.medium
,t3.large
.Job is larger than current Node instance types
The cluster is limited in terms of resources, and an Armada Job requires more resources than a single
t3.small
node can provide. To determine whether the job can be overscheduled, the conditionjobResources <= largest(nodeResources) * overscheduleMultiplier
must be satisfied. If the condition holds true, Armada will trigger overscheduling by submitting the job to the Executor cluster, even if there are not enough resources initially available. This will initiate the autoscaler to add more nodes to the Executor cluster, allowing the job to be scheduled and executed successfully.Job is smaler than current Node instance types
In this scenario, Armada Job resource requirements are less or equal to the resource amount of the largest node instance type, but the cluster lacks more nodes to handle additional workloads. In that case, Armada should allow the submission of the job, even if the current node capacity is insufficient. This triggers the autoscaler to take action by provisioning additional nodes to the Executor cluster. As a result, the required resources become available, and the job can be scheduled and executed successfully.
Open Questions
Conclusion
Implementing overscheduling in Armada will enable seamless integration with Autoscalers and improve resource utilization. By introducing the concept of overscheduling, Armada can dynamically adapt to workload demands and effectively scale its Executor clusters. This enhancement will provide users with more flexibility and customization options, ensuring optimal performance and resource allocation in Kubernetes batch scheduling scenarios.
Beta Was this translation helpful? Give feedback.
All reactions