Currently, this project is strongly based on: https://github.com/kayrus/prometheus-kubernetes
This is Work in Progress, but I believe that works
1. Create a namespace to group our resources and export NAMESPACE
env, in our case we named it monitoring
$ kubectl create namespace monitoring
namespace "monitoring" created
$ export NAMESPACE=monitoring
2. Create a TLS secret named etcd-tls-client-certs
Our Prometheus Deployment uses TLS keypair and TLS auth for etcd cluster
2.1 Generate keys
$ openssl req \
-x509 -newkey rsa:2048 -nodes -days 365 \
-keyout tls.key -out tls.crt -subj '/CN=localhost'
Generating a 2048 bit RSA private key
.................................................................+++
............................................................................................+++
writing new private key to 'tls.key'https://prometheus.io/docs/alerting/configuration/
-----
2.2 Create secret
$ kubectl create secret tls etcd-tls-client-certs --cert=tls.crt --key=tls.key -n=monitoring
secret "tls-secret" created
We have only slack alert template and configuration for Slack alerts. Change the slack api url properly according to your Slack Hooks configuration.
Prometheus alert rules which are already included in this repo:
- NodeCPUUsage > 50%
- NodeLowRootDisk > 80% (relates to
/root-disk
mount point insidenode-exporter
pod) - NodeLowDataDisk > 80% (relates to
/data-disk
mount point insidenode-exporter
pod) - NodeSwapUsage > 10%
- NodeMemoryUsage > 75%
- NodeLoadAverage (alerts when node's load average divided by amount of CPUs exceeds 1)
$ cd scripts/
$ . deploy.sh
configmap "external-url" created
configmap "grafana-imports" created
configmap "prometheus-rules" created
configmap "alertmanager-templates" created
configmap "alertmanager" created
configmap "prometheus" created
deployment "alertmanager" created
service "alertmanager" created
deployment "grafana" created
service "grafana" created
daemonset "node-exporter" created
configmap "prometheus-env" created
deployment "prometheus-deployment" created
service "prometheus-svc" created
Successfully deployed!
NAME READY STATUS RESTARTS AGE
alertmanager-670954578-gw5c0 0/1 ContainerCreating 0 2s
grafana-1556722099-xmkh1 0/2 ContainerCreating 0 1s
node-exporter-mt9c4 0/1 ContainerCreating 0 1s
node-exporter-pgf51 0/1 ContainerCreating 0 1s
node-exporter-v028j 0/1 ContainerCreating 0 1s
node-exporter-vbj2k 0/1 ContainerCreating 0 1s
prometheus-deployment-534706379-965p6 0/1 ContainerCreating 0 1s
The config directory contains the configuration's files used for creation of ConfigMaps by deploy.sh
- The config/alertmanager-cm directory contains the configuration file for the alertmanager. The ConfigMap is called by alertmanager deployment. More info in the docs
- The config/alertmanager-templates-cm directory contains custom alertmanager templates. The ConfigMap is called by alertmanager deployment. More info here.
- The config/grafana-imports-cm directory contains Grafana Dashboards and Prometheus Datasource Plugin. The ConfigMap is called by grafana deployment.
- The config/prometheus-cm directory contains the configuration file for Prometheus, including the K8S Service Discovery configs. The ConfigMap is called by prometheus deployment. More info in the docs.
- The config/prometheus-rules-cm directory contains the prometheus alert rules. The ConfigMap is called by prometheus deployment. More info in the docs
The deployments directory contains the definitions of our deployments and services. We exposed our services by NodePort, however, you can edit the following files removing the type: NodePort
spec of services and use Ingress instead. Both approaches can be found here.
- alertmanager-deploy-svc.yaml: Deployment and Service of alertmanager
- grafana-deploy-svc.yaml: Deployment and Service of Grafana, including dashboard/datasource imports
- node-exporter-ds.yaml: Deamonset to export hardware and OS metrics
- prometheus-deploy-svc.yaml: Deployment and Service of Prometheus
The Scripts directory contains automatized routines
- deploy.sh: Initialize all resources
- undeploy.sh: Delete all resources
- update_alertmanager_config.sh: Updates the alertmanager ConfigMap by changes made in alertmanager-cm
- update_alertmanager_templates.sh: Updates the alertmanager-templates ConfigMap by changes made in alertmanager-templates-cm
- update_grafana_imports.sh: Updates the grafana-imports ConfigMap by changes made in grafana-imports-cm
- update_prometheus_config.sh: Updates the prometheus ConfigMap by changes made in prometheus-cm
- update_prometheus_rules.sh: Updates the prometheus-rules ConfigMap by changes made in prometheus-rules-cm
Any question or suggestion: artmr@lsd.ufcg.edu.br