Skip to content

Commit

Permalink
Add Prometheus rules to swan-cern chart
Browse files Browse the repository at this point in the history
This is done so that they are deployed everytime whenenever
the swan-cern chart is deployed.
They define rules that are evaluated every 24h to create metrics
about SWAN unique users. The ones that contain "list" in the name
generate a list of unique users and the metrics that contain "peak"
calculate the total number of unique users in that day (both normal
and GPU users).
  • Loading branch information
PMax5 authored and etejedor committed Apr 19, 2024
1 parent fb809ed commit e27c249
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions swan-cern/templates/prometheus/rules.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app: prometheus
release: cern-magnum
name: swan.rules
namespace: {{ .Release.Namespace }}
spec:
groups:
- interval: 24h
name: swan.rules.users
rules:
- expr: count(last_over_time(kube_pod_status_phase{namespace="swan",phase="Running",pod=~"jupyter-[a-z]+"}[24h])>0) by (pod)
record: swan:users:unique:list:daily
- expr: max_over_time(count(count(kube_pod_status_phase{namespace="swan",phase="Running",pod=~"jupyter-[a-z]+"}>0) by (pod))[24h:5m])
record: swan:users:peak:daily
- expr: count(last_over_time(kube_pod_container_resource_requests{resource=~"nvidia_com_(gpu|mig.+)"}[24h])) by (pod)
record: swan:users:gpu:unique:list:daily
- expr: max_over_time(count(count(kube_pod_container_resource_requests{resource=~"nvidia_com_(gpu|mig.+)"}) by(pod))[24h:5m])
record: swan:users:gpu:peak:daily

0 comments on commit e27c249

Please sign in to comment.