-
Notifications
You must be signed in to change notification settings - Fork 0
Arkouda Prometheus Exporter and Grafana Integration
Within the MetricsMsg Chapel module is the following code required to generate and export metrics from Arkouda:
- increment/decrement int metrics encapsulated with a Chapel Map
- encapsulate metric data and metadata within a CounterMetric record
- Export Arkouda metric data as a JSON blob generated from 1..n CounterMetric objects
The count metric logic is as follows:
var countMetrics = new map(string, int);
proc getCount(metric: string) : int {
if !countMetrics.contains(metric) {
countMetrics.add(metric,0);
return 0;
}
return countMetrics.getValue(metric);
}
proc setCount(metric: string, count: int) {
countMetrics.addOrSet(metric,count);
}
proc incrementCount(metric: string, count: int=1) {
var current = getCount(metric);
setCount(metric, current+1);
}
proc decrementCount(metric: string, count: int=1) {
var current = getCount(metric);
if current >= 1 {
setCount(metric, current-1);
} else {
setCount(metric,0);
}
}
All metrics are exported as a JSON blob via the following logic:
proc exportCountMetrics() throws {
var metrics: [0..countMetrics.size-1] CounterMetric;
for (i,item) in zip(0..countMetrics.size-1,countMetrics.items()){
metrics[i] = new CounterMetric(item[0],item[1]);
}
return metrics;
}
record CounterMetric {
var name: string;
var value: int;
}
proc metricsMsg(cmd: string, payload: string, st: borrowed SymTab): MsgTuple throws {
var (metricType, metric) = payload.splitMsgToTuple(2);
mLogger.debug(getModuleName(),getRoutineName(),getLineNumber(),
'metricType: %s metric: %s'.format(metricType,metric));
if metric == 'ALL' {
var metrics = exportCountMetrics();
mLogger.debug(getModuleName(),getRoutineName(),getLineNumber(),
'metrics %t'.format(metrics));
return new MsgTuple("%jt".format(metrics), MsgType.NORMAL);
} else {
var countMetric = new CounterMetric(name=metric,value=getCount(metric));
mLogger.debug(getModuleName(),getRoutineName(),getLineNumber(),
'metric %t'.format(countMetric));
return new MsgTuple("%jt".format(countMetric), MsgType.NORMAL);
}
}
The MetricsMsg module is integration into the Arkouda server-side workflow within the arkouda_server module.
In the monitoring there is a Python monitoring module that contains a simple Arkouda Prometheus exporter. The Arkouda implementation is based upon an excellent example developed and documented by Thomas Stringer.
The prometheus_client Python Prometheus client library contains the core functionality required to deliver a Prometheus exporter.
In the Arkouda monitoring module, the fetch method makes a call to MetricsMsg w/ the 'ALL' parameter, meaning that all metrics will be returned to the client and then exported to Prometheus
def fetch(self):
results = json.loads(client.generic_msg(cmd='metrics', args='MetricType.COUNT ALL'))
for result in results:
metricName = result['name']
metricValue = self._getMetricValue(metricName,result['value'])
self._updateMetric[metricName](metricValue)
self._updateCache(metricName,metricValue)
In the first version, two metrics are maintained within the ArkoudaMetrics class:
self.numberOfRequests = Counter('arkouda_number_of_requests',
'Number of Arkouda requests')
self.numberOfConnections = Gauge('arkouda_number_of_connections',
'Number of Arkouda connections')
Within the main loop (1) the HTTP server that constitutes the scrape endpoint starts up and (2) the run_metrics_loop method periodically retrieves metric data from Arkouda.
def main():
"""Main entry point"""
pollingInterval = int(os.getenv("POLLING_INTERVAL_SECONDS", "5"))
exportPort = int(os.getenv("EXPORT_PORT", "5080"))
metrics = ArkoudaMetrics(
exportPort=exportPort,
pollingInterval=pollingInterval
)
start_http_server(exportPort)
metrics.run_metrics_loop()
Finally, Prometheus perioidically scrapes the latest Arkouda metrics from the ArkoudaMetrics object.
Prometheus can be deployed bare metal or on Kubernetes. There is a set of Prometheus Helm charts that can be used to deploy 1..n elements of the Prometheus-Grafana stack.
Prometheus is configured via the prometheus.yaml file.
Within the static_config section, add an entry for the ArkoudaMetrics app:
scrape_configs:
- job_name: arkouda
static_configs:
- targets:
- $ARKOUDA_METRICS_HOST:$ARKOUDA_METRICS_PORT
labels:
arkouda_instance: integration-test
launch_method: Kubernetes
The Helm install of just the Prometheus server is as follows, with a custom values.yaml file containing the above scrape_configs entry
helm install prometheus -f prometheus-values.yaml prometheus-community/prometheus
The following pods are deployed:
NAME READY STATUS RESTARTS AGE
prometheus-alertmanager-9c55db574-nkbdc 2/2 Running 0 3h39m
prometheus-kube-state-metrics-76f66976cb-gvhjn 1/1 Running 0 3h39m
prometheus-node-exporter-j2htp 0/1 Pending 0 3h39m
prometheus-pushgateway-598d657c9-rhfh2 1/1 Running 0 3h39m
prometheus-server-b6ddb8f5c-wtmh8 2/2 Running 0 3h39m
The arkouda/monitoring.py file has a main method which is the entrypoint into the Arkouda Prometheus exporter, which can be deployed bare metal or on Kubernetes.
To run on bare metal, run the following shell script:
#!/bin/bash
export METRICS_SERVICE_NAME=<kubernetes external service name or arkouda server hostname>
export METRICS_SERVICE_PORT=5556
export POLLING_INTERVAL_SECONDS=5
export EXPORT_PORT=5080
export ARKOUDA_SERVER_NAME=arkouda-ventura-metrics-exporter
python3 arkouda/monitoring.py
The arkouda-metrics-exporter-chart Helm chart encapsulates the configuration and logic required to deploy the prometheus-arkouda-exporter on k8s. The arkouda-metrics-exporter-chart is most often used to scrape metrics from Arkouda instances deployed on Slurm, bare-metal w/ udp, and other non-Kubernetes schedulers.
The multilocale-dynamic-arkouda-server-chart values.yaml contains a configuration section for the prometheus-arkouda-exporter. Also included is the dynamic-scrape-target subchart used to add/remove Arkouda Prometheus scrape targets upon Helm install/delete of Arkouda, respectively.
The Arkouda deployment requires setting two flags: one that configured the Metrics server thread and one to enable memory tracking:
./arkouda_server -nl 1 --logLevel=LogLevel.DEBUG --collectMetrics --memTrack
After starting both Prometheus and the Arkouda Prometheus Exporter, see if the expected queries are present in Prometheus:
The next step is to execute one of the queries and ensure the return result is correct. The Prometheus and Arkouda results for the arkouda_number_of_connections query match, so the Arkouda-Prometheus stack are configured correctly:
The kube-prometheus-stack Helm chart contains the Prometheus-Grafana stack and is really easy to use. A representative deployment is as follows:
helm install kube-stack -n monitoring prometheus-community/kube-prometheus-stack
The following pods are deployed:
bash-3.2$ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-kube-stack-kube-prometheus-alertmanager-0 2/2 Running 4 27h
kube-stack-grafana-5c5bd67fb7-l4mrd 2/2 Running 4 28h
kube-stack-kube-prometheus-operator-86484bc8f9-58dv4 1/1 Running 3 28h
kube-stack-kube-state-metrics-56c5bc7d4-s7lpg 1/1 Running 3 28h
kube-stack-prometheus-node-exporter-rwf5k 1/1 Running 3 28h
prometheus-kube-stack-kube-prometheus-prometheus-0 2/2 Running 4 27h
Pulling Arkouda metrics into Grafana via Prometheus involves adding the corresponding Prometheus instance as a data source:
The key configuration element is specifying the prometheus-server svc host:port:
bash-3.2$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23d
prom-server NodePort 10.107.26.143 <none> 80:32101/TCP 3h43m
prometheus-alertmanager ClusterIP 10.101.149.208 <none> 80/TCP 3h46m
prometheus-kube-state-metrics ClusterIP 10.99.115.193 <none> 8080/TCP 3h46m
prometheus-node-exporter ClusterIP None <none> 9100/TCP 3h46m
prometheus-pushgateway ClusterIP 10.100.76.215 <none> 9091/TCP 3h46m
prometheus-server ClusterIP 10.99.141.177 <none> 80/TCP 3h46m
In this case, the hostname is prometheus-server.default since this is deployed in the default namespace in Kubernetes and the port is 80. Accordingly, the configuration is as follows:
The way to know for certain if the Prometheus data source is correctly configured is by defining a query. Type in the first few letters of a query in the target Prometheus instance as follows:
A dashboard panel is created and the Prometheus queries returning the desired metrics are specified:
Enabling labels for Gauges and Counters involves the following configurations in the prometheus-client (example in the Python variant):
# Configuring the Gauge
self.pctMemoryUsed = Gauge('arkouda_pct_memory_used_per_locale',
'Percent memory used by Arkouda on each locale',
labelnames=['locale_name','locale_num'])
# Initializing the Gauge
locale_num = locale['id']
locale_name = locale['name']
self.pctMemoryUsed.labels(locale_name,locale_num)
# Setting Gauge value for each label combo
metric = Metric(scope=LOCALE, type_=PCT_MEMORY_USED, value=19.0, labels=[Label(name='locale_name', value='finkel'), Label(name='locale_num', value=1)])
self.pctMemoryUsed.labels(locale_name=metric.labels[0].value,
locale_num=metric.labels[1].value).set(metric.value)
All of this example code is in the monitoring module