Skip to content

Commit

Permalink
feat(docs): add more configuration details to the HPA docs
Browse files Browse the repository at this point in the history
* improve overall docs clarity
* explain prometheus-adapter ConfigMap customisation options
  • Loading branch information
lc525 committed Oct 30, 2024
1 parent 294b5f8 commit 6b0a72f
Showing 1 changed file with 101 additions and 89 deletions.
190 changes: 101 additions & 89 deletions docs-gb/kubernetes/hpa-rps-autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ helm install --set prometheus.url='http://seldon-monitoring-prometheus' hpa-metr
In the commands above, we install `prometheus-adapter` as a helm release named `hpa-metrics` in
the same namespace as our prometheus install, and point to its service URL (without the port).

The URL is not fully qualified as we are referencing prometheus from within the same namespace.
If you are using a separately-managed prometheus instance, please update the URL accordingly.

If running prometheus on a different port than the default 9090, you can also pass `--set
prometheus.port=[custom_port]` You may inspect all the options available as helm values by
running `helm show values prometheus-community/prometheus-adapter`
Expand All @@ -38,17 +41,13 @@ per-model RPS values. On install, the adapter has created a `ConfigMap` in the s
itself, named `[helm_release_name]-prometheus-adapter`. In our case, it will be
`hpa-metrics-prometheus-adapter`.

We want to overwrite this ConfigMap with the content below (please change the name if your helm
release has a different one). The manifest contains embedded documentation, highlighting how we
match the `seldon_model_infer_total` metric in Prometheus, compute a rate via a `metricsQuery`
and expose this to k8s as the `infer_rps` metric, on a per (model, namespace) basis.
We want to overwrite this ConfigMap as shown in the following example.

Other aggregations on per (server, namespace) and (pod, namespace) are also exposed and may be
used in HPA, but we will focus on the (model, namespace) aggregation in the examples below.
{% hint style="warning" %}
Please change the `name` if you've chosen a different value for the `prometheus-adapter` helm release name.
Please change the `namespace` to match the namespace where `prometheus-adapter` is installed.
{% endhint %}

You may want to modify some of the settings to match the prometheus query that you typically use
for RPS metrics. For example, the `metricsQuery` below computes the RPS by calling [`rate()`]
(https://prometheus.io/docs/prometheus/latest/querying/functions/#rate) with a 1 minute window.

````yaml
apiVersion: v1
Expand All @@ -59,83 +58,21 @@ metadata:
data:
config.yaml: |-
"rules":
# Rule matching Seldon inference requests-per-second metrics and exposing aggregations for
# specific k8s models, servers, pods and namespaces
#
# Uses the prometheus-side `seldon_model_(.*)_total` inference request count metrics to
# compute and expose k8s custom metrics on inference RPS `${1}_rps`. A prometheus metric named
# `seldon_model_infer_total` will be exposed as multiple `[group-by-k8s-resource]/infer_rps`
# k8s metrics, for consumption by HPA.
#
# One k8s metric is generated for each k8s resource associated with a prometheus metric, as
# defined in the "Association" section below. Because this association is defined based on
# labels present in the prometheus metric, the number of generated k8s metrics will vary
# depending on what labels are available in each discovered prometheus metric.
#
# The resources associated through this rule (when available as labels for each of the
# discovered prometheus metrics) are:
# - models
# - servers
# - pods (inference server pods)
# - namespaces
#
# For example, you will get aggregated metrics for `models.mlops.seldon.io/iris0/infer_rps`,
# `servers.mlops.seldon.io/mlserver/infer_rps`, `pods/mlserver-0/infer_rps`,
# `namespaces/seldon-mesh/infer_rps`
#
# Metrics associated with any resource except the namespace one (models, servers and pods)
# need to be requested in the context of a particular namespace.
#
# To fetch those k8s metrics manually once the prometheus-adapter is running, you can run:
#
# For "namespaced" resources, i.e. models, servers and pods (replace values in brackets):
# ```
# kubectl get --raw
# "/apis/custom.metrics.k8s.io/v1beta1/namespaces/[NAMESPACE]/[RESOURCE_NAME]/[CR_NAME]/infer_rps"
# ```
#
# For example:
# ```
# kubectl get --raw
# "/apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/models.mlops.seldon.io/iris0/infer_rps"
# ```
#
# For the namespace resource, you can get the namespace-level aggregation of the metric with:
# ```
# kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/infer_rps"
# ```
-
# Metric discovery: selects subset of metrics exposed in Prometheus, based on name and
# filters
"seriesQuery": |
{__name__=~"^seldon_model.*_total",namespace!=""}
"seriesFilters":
- "isNot": "^seldon_.*_seconds_total"
- "isNot": "^seldon_.*_aggregate_.*"
# Association: maps label values in the Prometheus metric to K8s resources (native or CRs)
# Below, we associate the "model" prometheus metric label to the corresponding Seldon Model
# CR, the "server" label to the Seldon Server CR, etc.
"resources":
"overrides":
"model": {group: "mlops.seldon.io", resource: "model"}
"server": {group: "mlops.seldon.io", resource: "server"}
"pod": {resource: "pod"}
"namespace": {resource: "namespace"}
# Rename prometheus metrics to get k8s metric names that reflect the processing done via
# the query applied to those metrics (actual query below under the "metricsQuery" key)
"name":
"matches": "^seldon_model_(.*)_total"
"as": "${1}_rps"
# The actual query to be executed against Prometheus to retrieve the metric value
# Here:
# - .Series is replaced by the discovered prometheus metric name (e.g.
# `seldon_model_infer_total`)
# - .LabelMatchers, when requesting a metric for a namespaced resource X with name x in
# namespace n, is replaced by `X=~"x",namespace="n"`. For example, `model=~"iris0",
# namespace="seldon-mesh"`. When requesting the namespace resource itself, only the
# `namespace="n"` is kept.
# - .GroupBy is replaced by the resource type of the requested metric (e.g. `model`,
# `server`, `pod` or `namespace`).
"metricsQuery": |
sum by (<<.GroupBy>>) (
rate (
Expand All @@ -144,13 +81,75 @@ data:
)
````

Apply the config, and restart the prometheus adapter deployment (this restart is required so
that prometheus-adapter picks up the new config):
In our example, a single rule is defined to fetch the `seldon_model_infer_total` metric
from Prometheus, compute its rate over a 1 minute window, and expose this to k8s as the `infer_rps`
metric, with aggregations at model, server, inference server pod and namespace level.

The rule definition can be broken down in four parts:

* _Discovery_ (the `seriesQuery` and `seriesFilters` keys): this controls what Prometheus
metrics are considered for exposure via the k8s custom metrics API.

In the example, all the Seldon Prometheus metrics of the form `seldon_model_*_total` are
considered, excluding metrics pre-aggregated across all models (`.*_aggregate_.*`) as well as
the cummulative infer time per model (`.*_seconds_total`). For RPS, we are only interested in
the model inference count (`seldon_model_infer_total`)

* _Association_ (the `resources` key): controls the Kubernetes resources that a particular
metric can be attached to or aggregated over

The resources key defines an association between certain labels from the Prometheus metric and
k8s resources. For example, `"model": {group: "mlops.seldon.io", resource: "model"}` let's
`prometheus-adapter` know that, for the selected Prometheus metrics, the value of the "model"
label represents the name of a k8s `model.mlops.seldon.io` CR.

One k8s custom metric is generated for each k8s resource associated with a prometheus metric.
In this way, it becomes possible to request the k8s custom metric values for
`models.mlops.seldon.io/iris` or for `servers.mlops.seldon.io/mlserver`.

The labels that *do not* refer to a `namespace` resource generate "namespaced" custom
metrics (the label values refer to resources which are part of a namespace).


* _Naming_ (the `name` key): configures the naming of the k8s custom metric

In the example ConfigMap, this is configured to take the Prometheus metric named
`seldon_model_infer_total` and expose custom metric endpoints named `infer_rps`, which when
called return the result of a query over the Prometheus metric.

The matching over the Prometheus metric name uses regex group capture expressions, which are
then be referenced in the custom metric name.

* _Querying_ (the `metricsQuery` key): defines how a request for a specific k8s custom metric gets
converted into a Prometheus query.

The query can make use of the following placeholders:

- .Series is replaced by the discovered prometheus metric name (e.g. `seldon_model_infer_total`)
- .LabelMatchers, when requesting a namespaced metric for resource `X` with name `x` in
namespace `n`, is replaced by `X=~"x",namespace="n"`. For example, `model=~"iris0",
namespace="seldon-mesh"`. When requesting the namespace resource itself, only the
`namespace="n"` is kept.
- .GroupBy is replaced by the resource type of the requested metric (e.g. `model`,
`server`, `pod` or `namespace`).

You may want to modify the query in the example to match the one that you typically use in
your monitoring setup for RPS metrics. The example calls [`rate()`](https://prometheus.io/docs/prometheus/latest/querying/functions/#rate)
with a 1 minute window.


For a complete reference for how `prometheus-adapter` can be configured via the `ConfigMap`, please
consult the docs [here](https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/config.md).


Once you have applied any necessary customisations, replace the default prometheus-adapter config
with the new one, and restart the deployment (this restart is required so that prometheus-adapter
picks up the new config):

```sh
# Apply prometheus adapter config
kubectl apply -f prometheus-adapter.config.yaml
# Restart prom-adapter pods
# Replace default prometheus adapter config
kubectl replace -f prometheus-adapter.config.yaml
# Restart prometheus-adapter pods
kubectl rollout restart deployment hpa-metrics-prometheus-adapter -n seldon-monitoring
```

Expand All @@ -173,25 +172,33 @@ List available metrics
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/ | jq .
```

Fetching model RPS metric for specific (namespace, model) pair:
For namespaced metrics, the general template for fetching is:

```sh
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/models.mlops.seldon.io/irisa0/infer_rps
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/[NAMESPACE]/[API_RESOURCE_NAME]/[CR_NAME]/[METRIC_NAME]"
```

Fetching model RPS metric aggregated at the (namespace, server) level:
For example:

```sh
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/servers.mlops.seldon.io/mlserver/infer_rps
```
* Fetching model RPS metric for specific (namespace, model) pair:

Fetching model RPS metric aggregated at the (namespace, pod) level:
```sh
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/models.mlops.seldon.io/irisa0/infer_rps
```

```sh
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/pods/mlserver-0/infer_rps
```
* Fetching model RPS metric aggregated at the (namespace, server) level:

```sh
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/servers.mlops.seldon.io/mlserver/infer_rps
```

Fetching the same metric aggregated at namespace level:
* Fetching model RPS metric aggregated at the (namespace, pod) level:

```sh
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/seldon-mesh/pods/mlserver-0/infer_rps
```

Fetching the same metric aggregated at namespace level (not namespaced):

```sh
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/infer_rps
Expand All @@ -204,8 +211,13 @@ the same metric: one scaling the Model, the other the Server. The example below
the mapping between Models and Servers is 1-to-1 (i.e no multi-model serving).

Consider a model named `irisa0` with the following manifest. Please note we don’t set
`minReplicas/maxReplicas` this is in order to disable the seldon-specific autoscaling so that it
doesn’t interact with HPA.
`minReplicas/maxReplicas`. This disables the seldon lag-based autoscaling so that it
doesn’t interact with HPA (separate `minReplicas/maxReplicas` configs will be set on the HPA
side)

You must also explicitly define a value for `spec.replicas`. This is the key modified by HPA
to increase the number of replicas, and if not present in the manifest it will result in HPA not
working until the Model CR is modified to have `spec.replicas` defined.

```yaml
apiVersion: mlops.seldon.io/v1alpha1
Expand Down

0 comments on commit 6b0a72f

Please sign in to comment.