feat(docs): add more configuration details to the HPA docs

* improve overall docs clarity * explain prometheus-adapter ConfigMap customisation options
SeldonIO · Oct 30, 2024 · 6b0a72f · 6b0a72f
1 parent 294b5f8
commit 6b0a72f
Showing 1 changed file with 101 additions and 89 deletions.
diff --git a/docs-gb/kubernetes/hpa-rps-autoscaling.md b/docs-gb/kubernetes/hpa-rps-autoscaling.md
@@ -29,6 +29,9 @@ helm install --set prometheus.url='http://seldon-monitoring-prometheus' hpa-metr
 In the commands above, we install `prometheus-adapter` as a helm release named `hpa-metrics` in
 the same namespace as our prometheus install, and point to its service URL (without the port).
 
+The URL is not fully qualified as we are referencing prometheus from within the same namespace.
+If you are using a separately-managed prometheus instance, please update the URL accordingly.
+
 If running prometheus on a different port than the default 9090, you can also pass `--set
 prometheus.port=[custom_port]` You may inspect all the options available as helm values by
 running `helm show values prometheus-community/prometheus-adapter`
@@ -38,17 +41,13 @@ per-model RPS values. On install, the adapter has created a `ConfigMap` in the s
 itself, named `[helm_release_name]-prometheus-adapter`. In our case, it will be
 `hpa-metrics-prometheus-adapter`.
 
-We want to overwrite this ConfigMap with the content below (please change the name if your helm
-release has a different one). The manifest contains embedded documentation, highlighting how we
-match the `seldon_model_infer_total` metric in Prometheus, compute a rate via a `metricsQuery`
-and expose this to k8s as the `infer_rps` metric, on a per (model, namespace) basis.
+We want to overwrite this ConfigMap as shown in the following example.
 
-Other aggregations on per (server, namespace) and (pod, namespace) are also exposed and may be
-used in HPA, but we will focus on the (model, namespace) aggregation in the examples below.
+{% hint style="warning" %}
+Please change the `name` if you've chosen a different value for the `prometheus-adapter` helm release name.
+Please change the `namespace` to match the namespace where `prometheus-adapter` is installed.
+{% endhint %}
 
-You may want to modify some of the settings to match the prometheus query that you typically use
-for RPS metrics. For example, the `metricsQuery` below computes the RPS by calling [`rate()`]
-(https://prometheus.io/docs/prometheus/latest/querying/functions/#rate) with a 1 minute window.
 
 ````yaml
 apiVersion: v1
@@ -59,83 +58,21 @@ metadata:
 data:
   config.yaml: |-
     "rules":
-    # Rule matching Seldon inference requests-per-second metrics and exposing aggregations for
-    # specific k8s models, servers, pods and namespaces
-    #
-    # Uses the prometheus-side `seldon_model_(.*)_total` inference request count metrics to
-    # compute and expose k8s custom metrics on inference RPS `${1}_rps`. A prometheus metric named
-    # `seldon_model_infer_total` will be exposed as multiple `[group-by-k8s-resource]/infer_rps`
-    # k8s metrics, for consumption by HPA.
-    #
-    # One k8s metric is generated for each k8s resource associated with a prometheus metric, as
-    # defined in the "Association" section below. Because this association is defined based on
-    # labels present in the prometheus metric, the number of generated k8s metrics will vary
-    # depending on what labels are available in each discovered prometheus metric.
-    #
-    # The resources associated through this rule (when available as labels for each of the
-    # discovered prometheus metrics) are:
-    # - models
-    # - servers
-    # - pods (inference server pods)
-    # - namespaces
-    #
-    # For example, you will get aggregated metrics for `models.mlops.seldon.io/iris0/infer_rps`,
-    # `servers.mlops.seldon.io/mlserver/infer_rps`, `pods/mlserver-0/infer_rps`,
-    # `namespaces/seldon-mesh/infer_rps`
-    #
-    # Metrics associated with any resource except the namespace one (models, servers and pods)
-    # need to be requested in the context of a particular namespace.
-    #
-    # To fetch those k8s metrics manually once the prometheus-adapter is running, you can run:
-    #
-    # For "namespaced" resources, i.e. models, servers and pods (replace values in brackets):
-    # ```
-    # kubectl get --raw
-    # "/apis/custom.metrics.k8s.io/v1beta1/namespaces/[NAMESPACE]/[RESOURCE_NAME]/[CR_NAME]/infer_rps"
-    # ```
-    #
-    # For example:
-    # ```
-    # kubectl get --raw
-    # "/apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/models.mlops.seldon.io/iris0/infer_rps"
-    # ```
-    #
-    # For the namespace resource, you can get the namespace-level aggregation of the metric with:
-    # ```
-    # kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/infer_rps"
-    # ```
     -
-      # Metric discovery: selects subset of metrics exposed in Prometheus, based on name and
-      # filters
       "seriesQuery": |
          {__name__=~"^seldon_model.*_total",namespace!=""}
       "seriesFilters":
         - "isNot": "^seldon_.*_seconds_total"
         - "isNot": "^seldon_.*_aggregate_.*"
-      # Association: maps label values in the Prometheus metric to K8s resources (native or CRs)
-      # Below, we associate the "model" prometheus metric label to the corresponding Seldon Model
-      # CR, the "server" label to the Seldon Server CR, etc.
       "resources":
         "overrides":
           "model": {group: "mlops.seldon.io", resource: "model"}
           "server": {group: "mlops.seldon.io", resource: "server"}
           "pod": {resource: "pod"}
           "namespace": {resource: "namespace"}
-      # Rename prometheus metrics to get k8s metric names that reflect the processing done via
-      # the query applied to those metrics (actual query below under the "metricsQuery" key)
       "name":
         "matches": "^seldon_model_(.*)_total"
         "as": "${1}_rps"
-      # The actual query to be executed against Prometheus to retrieve the metric value
-      # Here:
-      #   - .Series is replaced by the discovered prometheus metric name (e.g.
-      #     `seldon_model_infer_total`)
-      #   - .LabelMatchers, when requesting a metric for a namespaced resource X with name x in
-      #     namespace n, is replaced by `X=~"x",namespace="n"`. For example, `model=~"iris0",
-      #     namespace="seldon-mesh"`. When requesting the namespace resource itself, only the
-      #     `namespace="n"` is kept.
-      #   - .GroupBy is replaced by the resource type of the requested metric (e.g. `model`,
-      #     `server`, `pod` or `namespace`).
       "metricsQuery": |
         sum by (<<.GroupBy>>) (
           rate (
@@ -144,13 +81,75 @@ data:
         )
 ````
 
-Apply the config, and restart the prometheus adapter deployment (this restart is required so
-that prometheus-adapter picks up the new config):
+In our example, a single rule is defined to fetch the `seldon_model_infer_total` metric
+from Prometheus, compute its rate over a 1 minute window, and expose this to k8s as the `infer_rps`
+metric, with aggregations at model, server, inference server pod and namespace level.
+
+The rule definition can be broken down in four parts:
+
+* _Discovery_ (the `seriesQuery` and `seriesFilters` keys): this controls what Prometheus
+    metrics are considered for exposure via the k8s custom metrics API.
+
+  In the example, all the Seldon Prometheus metrics of the form `seldon_model_*_total` are
+  considered, excluding metrics pre-aggregated across all models (`.*_aggregate_.*`) as well as
+  the cummulative infer time per model (`.*_seconds_total`). For RPS, we are only interested in
+  the model inference count (`seldon_model_infer_total`)
+
+* _Association_ (the `resources` key): controls the Kubernetes resources that a particular
+    metric can be attached to or aggregated over
+
+  The resources key defines an association between certain labels from the Prometheus metric and
+  k8s resources. For example, `"model": {group: "mlops.seldon.io", resource: "model"}` let's
+  `prometheus-adapter` know that, for the selected Prometheus metrics, the value of the "model"
+  label represents the name of a k8s `model.mlops.seldon.io` CR.
+
+  One k8s custom metric is generated for each k8s resource associated with a prometheus metric.
+  In this way, it becomes possible to request the k8s custom metric values for
+  `models.mlops.seldon.io/iris` or for `servers.mlops.seldon.io/mlserver`.
+
+  The labels that *do not* refer to a `namespace` resource generate "namespaced" custom
+  metrics (the label values refer to resources which are part of a namespace).
+
+
+* _Naming_ (the `name` key): configures the naming of the k8s custom metric
+
+  In the example ConfigMap, this is configured to take the Prometheus metric named
+  `seldon_model_infer_total` and expose custom metric endpoints named `infer_rps`, which when
+  called return the result of a query over the Prometheus metric.
+
+  The matching over the Prometheus metric name uses regex group capture expressions, which are
+  then be referenced in the custom metric name.
+
+* _Querying_ (the `metricsQuery` key): defines how a request for a specific k8s custom metric gets
+    converted into a Prometheus query.
+
+  The query can make use of the following placeholders:
+
+    - .Series is replaced by the discovered prometheus metric name (e.g. `seldon_model_infer_total`)
+    - .LabelMatchers, when requesting a namespaced metric for resource `X` with name `x` in
+    namespace `n`, is replaced by `X=~"x",namespace="n"`. For example, `model=~"iris0",
+    namespace="seldon-mesh"`. When requesting the namespace resource itself, only the
+    `namespace="n"` is kept.
+    - .GroupBy is replaced by the resource type of the requested metric (e.g. `model`,
+    `server`, `pod` or `namespace`).
+
+  You may want to modify the query in the example to match the one that you typically use in
+  your monitoring setup for RPS metrics. The example calls [`rate()`](https://prometheus.io/docs/prometheus/latest/querying/functions/#rate)
+  with a 1 minute window.
+
+
+For a complete reference for how `prometheus-adapter` can be configured via the `ConfigMap`, please
+consult the docs [here](https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/config.md).
+
+
+Once you have applied any necessary customisations, replace the default prometheus-adapter config
+with the new one, and restart the deployment (this restart is required so that prometheus-adapter
+picks up the new config):
 
 ```sh
-# Apply prometheus adapter config
-kubectl apply -f prometheus-adapter.config.yaml
-# Restart prom-adapter pods
+# Replace default prometheus adapter config
+kubectl replace -f prometheus-adapter.config.yaml
+# Restart prometheus-adapter pods
 kubectl rollout restart deployment hpa-metrics-prometheus-adapter -n seldon-monitoring
 ```
 
@@ -173,25 +172,33 @@ List available metrics
 kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/ | jq .
 ```
 
-Fetching model RPS metric for specific (namespace, model) pair:
+For namespaced metrics, the general template for fetching is:
 
 ```sh
-kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/models.mlops.seldon.io/irisa0/infer_rps
+kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/[NAMESPACE]/[API_RESOURCE_NAME]/[CR_NAME]/[METRIC_NAME]"
 ```
 
-Fetching model RPS metric aggregated at the (namespace, server) level:
+For example:
 
-```sh
-kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/servers.mlops.seldon.io/mlserver/infer_rps
-```
+    * Fetching model RPS metric for specific (namespace, model) pair:
 
-Fetching model RPS metric aggregated at the (namespace, pod) level:
+        ```sh
+        kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/models.mlops.seldon.io/irisa0/infer_rps
+        ```
 
-```sh
-kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/pods/mlserver-0/infer_rps
-```
+    * Fetching model RPS metric aggregated at the (namespace, server) level:
+
+        ```sh
+        kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/servers.mlops.seldon.io/mlserver/infer_rps
+        ```
 
-Fetching the same metric aggregated at namespace level:
+    * Fetching model RPS metric aggregated at the (namespace, pod) level:
+
+        ```sh
+        kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/pods/mlserver-0/infer_rps
+        ```
+
+Fetching the same metric aggregated at namespace level (not namespaced):
 
 ```sh
 kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/infer_rps
@@ -204,8 +211,13 @@ the same metric: one scaling the Model, the other the Server. The example below
 the mapping between Models and Servers is 1-to-1 (i.e no multi-model serving).
 
 Consider a model named `irisa0` with the following manifest. Please note we don’t set
-`minReplicas/maxReplicas` this is in order to disable the seldon-specific autoscaling so that it
-doesn’t interact with HPA.
+`minReplicas/maxReplicas`. This disables the seldon lag-based autoscaling so that it
+doesn’t interact with HPA (separate `minReplicas/maxReplicas` configs will be set on the HPA
+side)
+
+You must also explicitly define a value for `spec.replicas`. This is the key modified by HPA
+to increase the number of replicas, and if not present in the manifest it will result in HPA not
+working until the Model CR is modified to have `spec.replicas` defined.
 
 ```yaml
 apiVersion: mlops.seldon.io/v1alpha1