OPSEXP-2528 Add basic cpu based auto scaling doc for ATS (#310)

Alfresco · Jun 5, 2024 · 54f87d4 · 54f87d4
1 parent 728a420
commit 54f87d4
Showing 1 changed file with 119 additions and 0 deletions.
diff --git a/charts/alfresco-transform-service/docs/autoscaling.md b/charts/alfresco-transform-service/docs/autoscaling.md
@@ -0,0 +1,119 @@
+---
+title: ATS Autoscaling
+parent: alfresco-transform-service
+grand_parent: Guides
+---
+
+# Alfresco Transform Service auto scaling
+
+This document describes auto scaling principles implemented in this Helm chart.
+
+This document do not explain how to setup Kubernetes worker nodes auto-scaling.
+That is a completely different topic which can be addressed in different ways
+and is up to the Kubernetes administrator.
+
+## Horizontal Pod Auto scaling
+
+For general concepts about HPA please refer to [official Kubernetes
+documentation](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/).
+
+Tengine pods auto scaling is disabled by default. If you want to enable it
+you need to use the additional value:
+
+```yaml
+tengine:
+  autoscaling:
+    enabled: true
+```
+
+> Where tengine is one of `imagemagick`, `libreoffice`, `transformmisc`, `tika`
+> or `pdfrenderer`.
+
+The default configuration implemented in this chart aims at being able to cope
+with peak load or spare resources on low level of utilization while also
+minimizing the number of scaling events because changing the cluster topology
+means additional computation.
+
+### Default behaviour:
+
+Without any further configuration scaling would happen as follow:
+
+* Cluster would spin up new pods every minute if the current average pods' load
+  remains above 75% CPU usage on average during 30 seconds.
+* Cluster would spin up no more than 2 pods or 50% more pods of the existing
+  replicas (whichever is bigger) per minute.
+* There would never be more than 3 replicas in the Replicaset
+* There would never be less than 1 replicas in the Replicaset
+* Cluster would remove one pod at most if the CPU load get consistently below
+  75% on average for 5 minutes.
+* Cluster would kill pod only one after the other within a minute.
+
+> CPU utilization/load is calculated with regards to the CPU resource request
+> setting (`.resources.requests.cpu`) which is now set to 1 CPU by default.
+
+### Customizing auto scaling
+
+The values and behaviour exposed above are defaults we think are sensible to
+start with. Of course they will not fit every single deployment/installation of
+Alfresco on Kubernetes.
+
+Below are ways to tweak the auto scaling behaviour for your own setup.
+
+#### Settings the CPU resources correctly.
+
+CPU Resources request is the basis of calculation for the cluster to trigger
+scaling events. It is then very important to make sure it is set appropriately
+before enabling auto scaling.
+Imagine you have a production Kubernetes cluster with large worker nodes (say
+3 * 32 CPU nodes). Having CPU requests set to 1 would most likely make the
+cluster spin up new pods very quickly. Instead it would be better to ensure
+your tengine pods have a sensible `.resources.requests.cpu` value set to say
+4.
+Also you should note that the very same `.resources.requests.cpu` value is used
+by the Kubernetes scheduler, so setting it too high is not a good idea either.
+It should be set to a value which will allow pods to be scheduled on worker
+nodes alongside other pods.
+
+> default `.resources.limits.cpu` is set to 4 CPU so you will also want to
+> increase this value to something like 12.
+
+Just by setting a sensible `.resources.requests.cpu` the auto scaling behaviour
+would already make much more sense given the worker nodes' size.
+
+#### Configuring the auto scaling behaviour
+
+The `autoscaling.*` values below can be fined tuned. There a few things to take
+int account when changing those as explained:
+
+* `.minReplicas`: This parameter is used to limit the lowest number of replicas
+* `.maxReplicas`: This parameter is used to limit the highest number of replicas
+* `.behavior.scaleUp.stabilizationWindowSeconds`: This parameter is used to
+  avoid flapping replicasets. A very short peak load is not worth a scale up.
+  So you surely want to avoid scaling without making sure the load has increased
+  consistently and scale up needs to happen. That parameter tells for how long
+  the load should be over target before the HPA controller decides to increase
+  the number of replicas. The shorter this value, the more likely you are to
+  spin up pods for short peak workload, so the pod could even be ready after the
+  load is actually back to normal. Setting this too low when CPU resources
+  requests is set to 1 is counter productive as it's pretty easy to make 1 CPU
+  busy.
+* `.behavior.scaleDown.stabilizationWindowSeconds`: This parameter is used in
+  the same manner as for scale up events. One notable difference though is that
+  scaling down has an immediate effect on the way the application handles the
+  workload while scaling up is more expensive as pods need to startup before it
+  can actually start handling requests. For that reason we think one should
+  always be more cautious when brining the number of replica down. You probably
+  want to avoid taking pods down too quickly if your workload is not very
+  consistent.
+* `.behavior.scaleUp.policies[]:
+  * `.periodSeconds`: The faster an individual tengine pod is to startup (Tomcat
+    complete startup) the lower this parameter can be. 60s appeared a good
+    value for startup times around 90s. The lower this parameter the faster new
+    pods can be spun up and the faster peak load can be handled.
+  * `.type` & `value`: For exact details on setting this part of the auto
+    scaler policy check the [kubernetes
+    doc](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#scaling-policies).
+    Policies let you define the way you want to act upon scaling events. For
+    scale up events, if you know your load peak are steep (but consistent) then
+    you will want to scale the replicaset by more pods than if you load is
+    growing more slowly.