Skip to content

Arkouda Prometheus Exporter and Grafana Integration

hokiegeek2 edited this page Jun 7, 2022 · 6 revisions

Arkouda Prometheus Exporter

Arkouda Generation of Exportable Metrics

Within the MetricsMsg Chapel module is the following code required to generate and export metrics from Arkouda:

  1. increment/decrement int metrics encapsulated with a Chapel Map
  2. encapsulate metric data and metadata within a CounterMetric record
  3. Export Arkouda metric data as a JSON blob generated from 1..n CounterMetric objects

The count metric logic is as follows:

    var countMetrics = new map(string, int);

    proc getCount(metric: string) : int {
        if !countMetrics.contains(metric) {
            countMetrics.add(metric,0);
            return 0;
        }
        
        return countMetrics.getValue(metric);
    }
    
    proc setCount(metric: string, count: int) {
        countMetrics.addOrSet(metric,count);
    }
    
    proc incrementCount(metric: string, count: int=1) {
        var current = getCount(metric);
        setCount(metric, current+1);
    }
    
    proc decrementCount(metric: string, count: int=1) {
        var current = getCount(metric);
        if current >= 1 {
            setCount(metric, current-1);    
        } else {
            setCount(metric,0);
        }
    }

All metrics are exported as a JSON blob via the following logic:

    proc exportCountMetrics() throws {
        var metrics: [0..countMetrics.size-1] CounterMetric;
         
        for (i,item) in zip(0..countMetrics.size-1,countMetrics.items()){
            metrics[i] = new CounterMetric(item[0],item[1]);
        }
        return metrics;
    }
    
    record CounterMetric {
        var name: string;
        var value: int;
    }    

    proc metricsMsg(cmd: string, payload: string, st: borrowed SymTab): MsgTuple throws {       
        var (metricType, metric) = payload.splitMsgToTuple(2);
        
        mLogger.debug(getModuleName(),getRoutineName(),getLineNumber(),
                            'metricType: %s metric: %s'.format(metricType,metric));
        if metric == 'ALL' {
            var metrics = exportCountMetrics();

            mLogger.debug(getModuleName(),getRoutineName(),getLineNumber(),
                            'metrics %t'.format(metrics));
            return new MsgTuple("%jt".format(metrics), MsgType.NORMAL);        
        } else { 
            var countMetric = new CounterMetric(name=metric,value=getCount(metric));
            mLogger.debug(getModuleName(),getRoutineName(),getLineNumber(),
                            'metric %t'.format(countMetric));
            return new MsgTuple("%jt".format(countMetric), MsgType.NORMAL);
        }
    }

The MetricsMsg module is integration into the Arkouda server-side workflow within the arkouda_server module.

Python Prometheus Client for Arkouda

In the monitoring there is a Python monitoring module that contains a simple Arkouda Prometheus exporter. The Arkouda implementation is based upon an excellent example developed and documented by Thomas Stringer.

Core Logic of Arkouda Prometheus Exporter

The prometheus_client Python Prometheus client library contains the core functionality required to deliver a Prometheus exporter.

Arkouda Metrics Prometheus exporter

In the Arkouda monitoring module, the fetch method makes a call to MetricsMsg w/ the 'ALL' parameter, meaning that all metrics will be returned to the client and then exported to Prometheus

    def fetch(self):
        results = json.loads(client.generic_msg(cmd='metrics', args='MetricType.COUNT ALL'))
        
        for result in results:
            metricName = result['name']
            metricValue = self._getMetricValue(metricName,result['value'])
            self._updateMetric[metricName](metricValue)
            self._updateCache(metricName,metricValue)

In the first version, two metrics are maintained within the ArkoudaMetrics class:

        self.numberOfRequests = Counter('arkouda_number_of_requests', 
                                                    'Number of Arkouda requests')
        self.numberOfConnections = Gauge('arkouda_number_of_connections', 
                                                    'Number of Arkouda connections')

Within the main loop (1) the HTTP server that constitutes the scrape endpoint starts up and (2) the run_metrics_loop method periodically retrieves metric data from Arkouda.

def main():
    """Main entry point"""

    pollingInterval = int(os.getenv("POLLING_INTERVAL_SECONDS", "5"))
    exportPort = int(os.getenv("EXPORT_PORT", "5080"))

    metrics = ArkoudaMetrics(
        exportPort=exportPort,
        pollingInterval=pollingInterval
    )
    start_http_server(exportPort)
    metrics.run_metrics_loop()

Prometheus scrape of ArkoudaMetrics

Finally, Prometheus perioidically scrapes the latest Arkouda metrics from the ArkoudaMetrics object.

Deployment

Prometheus

Prometheus can be deployed bare metal or on Kubernetes. There is a set of Prometheus Helm charts that can be used to deploy 1..n elements of the Prometheus-Grafana stack.

Prometheus Configuration

Prometheus is configured via the prometheus.yaml file.

Within the static_config section, add an entry for the ArkoudaMetrics app:

    scrape_configs:
      - job_name: arkouda
        static_configs:
          - targets:
            - $ARKOUDA_METRICS_HOST:$ARKOUDA_METRICS_PORT
            labels:
              arkouda_instance: integration-test
              launch_method: Kubernetes

Prometheus Helm Install

The Helm install of just the Prometheus server is as follows, with a custom values.yaml file containing the above scrape_configs entry

helm install prometheus -f prometheus-values.yaml prometheus-community/prometheus

The following pods are deployed:

NAME                                                        READY   STATUS    RESTARTS   AGE
prometheus-alertmanager-9c55db574-nkbdc                     2/2     Running   0          3h39m
prometheus-kube-state-metrics-76f66976cb-gvhjn              1/1     Running   0          3h39m
prometheus-node-exporter-j2htp                              0/1     Pending   0          3h39m
prometheus-pushgateway-598d657c9-rhfh2                      1/1     Running   0          3h39m
prometheus-server-b6ddb8f5c-wtmh8                           2/2     Running   0          3h39m

Arkouda Prometheus Exporter Deployment

The arkouda/monitoring.py file has a main method which is the entrypoint into the Arkouda Prometheus exporter, which can be deployed bare metal or on Kubernetes.

Bare Metal

To run on bare metal, run the following shell script:

#!/bin/bash
  
export METRICS_SERVICE_NAME=<kubernetes external service name or arkouda server hostname>
export METRICS_SERVICE_PORT=5556
export POLLING_INTERVAL_SECONDS=5
export EXPORT_PORT=5080
export ARKOUDA_SERVER_NAME=arkouda-ventura-metrics-exporter

python3 arkouda/monitoring.py

Kubernetes

The arkouda-metrics-exporter-chart Helm chart encapsulates the configuration and logic required to deploy the prometheus-arkouda-exporter on k8s. The arkouda-metrics-exporter-chart is most often used to scrape metrics from Arkouda instances deployed on Slurm, bare-metal w/ udp, and other non-Kubernetes schedulers.

Co-Deployed with Arkouda-on-Kubernetes

The multilocale-dynamic-arkouda-server-chart values.yaml contains a configuration section for the prometheus-arkouda-exporter. Also included is the dynamic-scrape-target subchart used to add/remove Arkouda Prometheus scrape targets upon Helm install/delete of Arkouda, respectively.

Arkouda Deployment

The Arkouda deployment requires setting two flags: one that configured the Metrics server thread and one to enable memory tracking:

./arkouda_server -nl 1 --logLevel=LogLevel.DEBUG --collectMetrics --memTrack 

Arkouda-Prometheus Integration

After starting both Prometheus and the Arkouda Prometheus Exporter, see if the expected queries are present in Prometheus:

The next step is to execute one of the queries and ensure the return result is correct. The Prometheus and Arkouda results for the arkouda_number_of_connections query match, so the Arkouda-Prometheus stack are configured correctly:

Grafana Deployment and Prometheus Integration

Grafana Kubernetes Deployment

The kube-prometheus-stack Helm chart contains the Prometheus-Grafana stack and is really easy to use. A representative deployment is as follows:

helm install kube-stack -n monitoring prometheus-community/kube-prometheus-stack

The following pods are deployed:

bash-3.2$ kubectl get pods -n monitoring
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-kube-stack-kube-prometheus-alertmanager-0   2/2     Running   4          27h
kube-stack-grafana-5c5bd67fb7-l4mrd                      2/2     Running   4          28h
kube-stack-kube-prometheus-operator-86484bc8f9-58dv4     1/1     Running   3          28h
kube-stack-kube-state-metrics-56c5bc7d4-s7lpg            1/1     Running   3          28h
kube-stack-prometheus-node-exporter-rwf5k                1/1     Running   3          28h
prometheus-kube-stack-kube-prometheus-prometheus-0       2/2     Running   4          27h 

Prometheus-Grafana Integration

Pulling Arkouda metrics into Grafana via Prometheus involves adding the corresponding Prometheus instance as a data source:

The key configuration element is specifying the prometheus-server svc host:port:

bash-3.2$ kubectl get svc
NAME                                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kubernetes                                 ClusterIP   10.96.0.1        <none>        443/TCP        23d
prom-server                                NodePort    10.107.26.143    <none>        80:32101/TCP   3h43m
prometheus-alertmanager                    ClusterIP   10.101.149.208   <none>        80/TCP         3h46m
prometheus-kube-state-metrics              ClusterIP   10.99.115.193    <none>        8080/TCP       3h46m
prometheus-node-exporter                   ClusterIP   None             <none>        9100/TCP       3h46m
prometheus-pushgateway                     ClusterIP   10.100.76.215    <none>        9091/TCP       3h46m
prometheus-server                          ClusterIP   10.99.141.177    <none>        80/TCP         3h46m

In this case, the hostname is prometheus-server.default since this is deployed in the default namespace in Kubernetes and the port is 80. Accordingly, the configuration is as follows:

The way to know for certain if the Prometheus data source is correctly configured is by defining a query. Type in the first few letters of a query in the target Prometheus instance as follows:

A dashboard panel is created and the Prometheus queries returning the desired metrics are specified:

Prometheus Configuration

Enabling Labels

Enabling labels for Gauges and Counters involves the following configurations in the prometheus-client (example in the Python variant):

# Configuring the Gauge
self.pctMemoryUsed = Gauge('arkouda_pct_memory_used_per_locale', 
                            'Percent memory used by Arkouda on each locale',
                            labelnames=['locale_name','locale_num'])

# Initializing the Gauge
locale_num = locale['id']
locale_name = locale['name']
self.pctMemoryUsed.labels(locale_name,locale_num)

# Setting Gauge value for each label combo
metric = Metric(scope=LOCALE, type_=PCT_MEMORY_USED, value=19.0, labels=[Label(name='locale_name', value='finkel'), Label(name='locale_num', value=1)])

self.pctMemoryUsed.labels(locale_name=metric.labels[0].value,
                                       locale_num=metric.labels[1].value).set(metric.value)

All of this example code is in the monitoring module