Monitoring tools for https://0l.network
This repository provides guides for both monitoring providers and node operators.
- Monitoring providers [
MPs
]: any party willing to provide monitoring services for 0L node operators by running monitoring tools such as Prometheus stack. - 0L node operators [
OPs
]: any party running any type of 0L nodes (validator/VFN or fullnode) who want to minitor their nodes.
Prometheus is an open source application which can scrap the real-time metrics to monitor events and also do real-time alerting.
Grafana is an analytical and visualization tool which is helpful to create interactive charts & graphs from the data and alerts scraped from the monitoring tools.
0L diem node exports set of Prometheus metrics that we would like to collect and use to build Grafana dashboards. These
are exported on ports 9101
and 9012
. In addition to diem metrics, node operators can choose to expose system metrics
like CPU, memory, storage, and others using Prometheus Node Exporter
.
Guides on how to set up Prometheus and Grafana instances can be found here:
- Prometheus [MPs]
- Grafana [MPs]
As for node operators they can follow the steps below to allow monitoring providers to collect metrics from their hosts.
-
Pick your monitoring provider from the list below
-
Open ports
9100-9101
to$PROMETHEUS_STATIC_IP
(and probably to your own IP as well)Depending on your host and firewall, you might need to enable that on different places;
ufw
, Digital Ocean Firewall, AWS Security Groups, etc. -
Install Node Exporter This assumes you are running Ubuntu
sudo apt update sudo apt install prometheus-node-exporter
or use manual setup
-
Confirm these endpoints are working
curl http://YOUR-IP:9100/metrics
curl http://YOUR-IP:9101/metircs
-
Share your validator account address, host IP(s), and a Discord handle with the monitoring provider
Example dashboards from Bᴺ 𝕊pace.
-
Prometheus Static IP: 85.215.101.127 Grafana Url : https://grafana.openlibra.space Auth : `viewer:viewer` (view only)
Discord:
@nourspace#6652
- Add specific todos for Prometheus and Grafana setup guides
- Consider using K8s operators and/or Helm charts to run Prometheus stack
- Use HTTPs and load balancers
- Link to and/or integrate other monitoring tools built by the 0L community
- Enable alerting on Grafana dashboards
Some tasks and question from the Hackmd document that need to be integrated in the current todos.
https://hackmd.io/9dxv7ZwYS1yOmBVSjSV2wg
-
Security: We want to create our own node-exporter config to only send meaningful and safe system metrics.
-
Decentralization: We are running the two instances on our own for now, but thinking how to move this forward where there is no single point of failure neither a single entity hosting everyone's metrics.
- Create Prometheus instance
- Install node-exporter on own node
- Test scraping own diem and node metrics
- Collect more IPs
- Install Grafana
- Provide viewer access
- Provide editor access
- Consider anon access? https://grafana.com/docs/grafana/latest/auth/grafana/#anonymous-authentication
- Create dashboards/panels
- Experimental
- Validator Monitoring
- System Monitoring: Node-Exporter dashboards
- Define more dashboards Inspirations
- Auto-discover node IPs
- Parse JSON metrics from port
6191
http://35.184.98.21:6191/metrics - Use port 80 for Grafana Both should be faced with some load balancer
- Move everything to a proper setup on K8s URLs, SSL, access, backups, ...