Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer replicas / resource request based on label #97

Open
harpratap opened this issue Aug 10, 2023 · 3 comments
Open

buffer replicas / resource request based on label #97

harpratap opened this issue Aug 10, 2023 · 3 comments
Labels
kind/feature New feature or request

Comments

@harpratap
Copy link
Contributor

harpratap commented Aug 10, 2023

Spot Nodes are very cheap but they have high risk of getting evicted at very short notice.

In order to use Spot VMs efficiently we can always have much higher replica count compared to running on OnDemand Nodes, this will solve 2 problems -

  • Spread pods on more nodes so less risk of eviction of 100% of replicas at same time
  • The eviction period is quite short 120seconds at best, but new pods take long time to become ready (sometimes more than 5minutes), so existing replicas should be able to handle traffic even when lot of replicas are down. Meaning CPU utilization per pod will be lower than that of OnDemand Nodes

So suggestion is to run 25% (or even 50%) more replicas and ideal scenario if Deployment is running completely on Spot Nodes

@sanposhiho sanposhiho added the kind/feature New feature or request label Aug 10, 2023
@sanposhiho
Copy link
Collaborator

sanposhiho commented Aug 10, 2023

It sounds like related to --upper-target-resource-utilization that we already have for the admin configuration:
https://github.com/mercari/tortoise/blob/main/docs/flag-configuration.md#upper-target-resource-utilization

So, we can extend this for the usecase so that the admin can specify which kind of Pods should get XXX % target at max.
Like:

upperTargetResourceUtilization:
- labelSelector:
     matchExpressions:
     - key: nodepool
        operator: In
        values:
         - spot
   upperTargetResourceUtilization: 50%
- labelSelector:
     matchExpressions:
     - key: nodepool
        operator: NotIn
        values:
         - spot
   upperTargetResourceUtilization: 80% 

(after #98)

@sanposhiho sanposhiho changed the title Buffer replicas for Spot Node Pods buffer replicas / resource request based on label Aug 17, 2023
@sanposhiho sanposhiho added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 17, 2023
@sanposhiho
Copy link
Collaborator

Another usecase is some workloads, which have strict SLO, need to have additional buffer on the top of VPA recommendation.

@sanposhiho
Copy link
Collaborator

sanposhiho commented Oct 4, 2023

So, we need to have a new configuration for that:

additionalBuffer:
- labelSelector:
     matchExpressions:
     - key: slo
        operator: In
        values:
         - "super-high"
   # either resources or resourceRatio should be configured.
   resources:
      cpu:        1         # 1 CPU is always added on the top of VPA recommendation.
      memory: 1GB    # 1GB memory is always added on the top of VPA recommendation.
   resourcesRatio:
      cpu:        10%     # 10% CPU is always added on the top of VPA recommendation.
      memory: 10%    # 10% memory is always added on the top of VPA recommendation.

@sanposhiho sanposhiho removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants