The demo consists of a Spring 1.5 application and configured prometheus with grafana and alertmanager. To make it event easier, a request generator will let you stress the monitored endpoints with random errors and latency injections.
Start the appliction, prometheus, alertmanager, and grafana with docker-compose:
make docker_build make start
Access the services:
- 8080 - our application
- 9090 - prometheus webgui
- 9093 - alertmanager
- 3000 - grafana
Single calls, see targets with prefix srv_ in Makefile
Generate trafic (open grafana dashboard to see the metrics):
make srv_wrk_random
- What can we learn from the graphs?
- Can we say sth about out random calls?
- Naming? Is it good?
Docker-compose mounts all configuration from the git repo. You can change it locally on your laptop.
To reload prometheus configuration after changes:
make prometheus_reload_config
To reload grafana configuration, restart the grafana docker:
docker restart java-prom_grafana_1
start the app and prometheus stack with docker-compose:
make start
check the Makefile for example of calls
to use the traffic generator, you need to install first wrk:
make srv_wrk_random
order_mgmt_duration_seconds_sum{job=~".*"} or order_mgmt_database_duration_seconds_sum{job=~".*"} or order_mgmt_audit_duration_seconds_sum{job=~".*"}
based on weave blog (
sum(irate(order_mgmt_duration_seconds_count{job=~".*"}[1m])) by (status_code)
will give you the rate of requests returning 500s:
sum(irate(order_mgmt_duration_seconds_count{job=~".*", status_code=~"5.."}[1m]))
by status_code:
sum(irate(order_mgmt_duration_seconds_count{job=~".*"}[1m])) by (status_code)
sum(irate(order_mgmt_duration_seconds_count{job=~".*", status_code=~"5.."}[1m]))
will give you the 5-min moving 99th percentile request latency:
histogram_quantile(0.99, sum(rate(order_mgmt_duration_seconds_count{job=~".*",ws="false"}[5m])) by (le))