Support application performance management (APM) #153

pbolduc · 2023-04-07T00:12:25Z

Is your feature request related to a problem? Please describe.

When running COMS in OpenShift, there is no easy way to monitor the heath of a COMS deployment.

readiness / health check end point
metrics that are exposed as Prometheus metrics that can be scraped by sysdig or similar metrics system,
logs should be configurable to be sent to other locations like Splunk
logs should be less verbose, every single request that completes is logged out, this produces too much noise and data.
a lot of the verbose log data could be exposed as metrics (histogram for operation elapsed time, with tags for operation name, http status code, http verb).

See

TimCsaky · 2023-12-11T18:00:24Z

Update:
The hosted COMS service uses a GitHub actions based pipeline with OpenShift deployment templates (managed by Helm). These are included in the COMS repo. (see .github and charts directories)
The app containers do have configured liveness/readiness checks by calling the root path. We use the bc gov sysdig service to log/alert on those failing. I documented our Sysdig set-up for our hosted API's. But perhaps we do need a dedicated 'health' endpoint. I will raise that with the devs.

We're using the Express/Winston logging middleware that allows for different logging output levels. When we get time we would like to include a fluent-bit container that sends application, access, error logs etc to different outputs.
For now we only monitor http errors using Sysdig.

pbolduc changed the title ~~Provide instrumentation and diagnostics~~ Support application performance management (APM) Apr 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support application performance management (APM) #153

Support application performance management (APM) #153

pbolduc commented Apr 7, 2023 •

edited

Loading

TimCsaky commented Dec 11, 2023

Support application performance management (APM) #153

Support application performance management (APM) #153

Comments

pbolduc commented Apr 7, 2023 • edited Loading

Is your feature request related to a problem? Please describe.

TimCsaky commented Dec 11, 2023

pbolduc commented Apr 7, 2023 •

edited

Loading