Replies: 6 comments 13 replies
-
more metrics to consider
|
Beta Was this translation helpful? Give feedback.
-
So, @mzabaluev I think before we start talking about the tools and implementation too much, we need to cover what our "resources" are and where they are in code w.r.t USE/RED approach to monitoring: https://github.com/movementlabsxyz/sre/tree/main/monitoring To me this looks like a discussion here where we identify the resource in the abstract, correlate it with code on main, and describe how USE/RED are applied. Something like this:
|
Beta Was this translation helpful? Give feedback.
-
I have started discussions on specific metrics:
Much remain unsettled to progress with specific implementation, though. |
Beta Was this translation helpful? Give feedback.
-
Should there be a metric for the rate of successfully settled (or, inversely, rejected) blocks? |
Beta Was this translation helpful? Give feedback.
-
We can try using tokio-metrics for performance research. |
Beta Was this translation helpful? Give feedback.
-
Should we try to start with implementing OpenTelemetry push in the service binaries for some of the metrics, and see if it works for us? @radupopa369, @l-monninger, are there tools we can readily deploy to analyze e.g. TPS from an OpenTelemetry event series? |
Beta Was this translation helpful? Give feedback.
-
Technical details of implementation for https://github.com/movementlabsxyz/sre/discussions/63 will be discussed here.
Enablers
There are a few available solutions to expose metrics and timing information.
Timing information with the tracing framework
Already implemented in
movement-timing
.Features:
Prometheus metrics
Features:
OpenTelemetry metrics
Features:
Node Instrumentation
To support SRE, the node should provide metrics for:
Number of transactions in flight
Gas per second
Ad-hoc instrumentation for development and troubleshooting
The tracing framework could be used for more fine-grained instrumentation to obtain measurements of specific flows.
Infrastructure metrics
These common metrics should be collected by the deployment infrastructure.
Measurement challenges
Listing measurement tasks that may require some engineering effort.
Beta Was this translation helpful? Give feedback.
All reactions