Skip to content
Jeremy Ho edited this page Mar 27, 2024 · 7 revisions

This page gives a brief overview of our use of Sysdig

Sections of this page

External Links

Access

In the Sysdig UI, resources are available within the scope of a Team. We have one team per project. switch Team scope (from lower left-hand link) to see resourses for that project. After logging in to sysdig (using your Azure gov id), you need to switch to the team context in which the project exists. see Sysdig docs for more details

An OpenShift SysdigTeam operator must be configured in the *-tools namespaces in OpenShift to enable access. BC Gov platform team have a guide for [configuring access](https://docs.developer.gov.bc.ca/sysdig-monitor-setup-team/ https://app.sysdigcloud.com/#/settings/teams).

Sysdig team for each project:

Project Sysdig Team
CDOGS 2250c5-team
CHES b160aa-team
COMS bb17f9-team
BCBox e7679d-team
PCNS d9d78e-team
CHESS 10d873-team
DGRSC bb0279-team
CAVMS 8035d1-team

Sysdig API

There's a REST api for manging our Sysdig configuration. Sysdig API spec: https://app.sysdigcloud.com/api/public/docs/index.html

To authenticate, get token from sysdig by going to user settings (scoped to current team): and add a Authorization: Bearer <token> header to the api requests.

For example, to list dashboards:

GET https://app.sysdigcloud.com/api/v3/dashboards/list HTTP/1.1
Authorization: Bearer abcdefg-a574-4359-b6c9-fa3e2dc30acb
Accept: application/json

All Sysdig resources have been saved to our wiki

Dashboards

There are a library of pre-built dashboards for our projects. These show a range of data including details of our deployment, http, application and much more. We also have our own dashbaords for our most useful metrics. After logging in to sysdig (using your Azure gov id), you need to switch to the team context in which the project exists. see: [Access] information above. You can find the dashbaord.

Note: Sysdig agents collect 1-second samples and report data at a 10-second resolution. It is the lowest resolution at which Sysdig Monitor stores the data. https://docs.sysdig.com/en/docs/sysdig-monitor/using-monitor/metrics/data-aggregation/

Alerts

We have various alerts sent to our team's shared inbox (Email) as well as our #monitoring Discord channel (Custom Webhook):

  • PVC usage over 85%
  • PVC usage over 90%
  • patroni workloads ready < 3
  • patroni workloads ready < 2
  • HTTP 5xx errors from app containers*
  • OpenShift container waiting

*This alert only goes to our Discord #monitoring channel. Sysdig does not expose the full access log including Client ID. We should use our fluent-bit > fluentd > discord alerting process where available.

Discord notification body template:

{
"content": "Alert Name: {{@alert_name}}\nSeverity: {{@alert_severity}}\nDescription: {{@alert_description}}\nNamespace: {{@event_labels.kube_namespace_name}}\nEvent Entity: {{@event_entity}}\nMore Details: {{@event_url}}"
}

Saved Configurations

The following table contains links to our saved Sysdig configuration. Last updated 2023/12/05

Project Dashboard Alerts
CDOGS cdogs_dashboard cdogs_alerts.json
CHES ches_dashboard.json ches_alerts.json
COMS coms_dashboard.json coms_alerts.json
BCBox bcbox_dashboard.json bcbox_alerts.json
PCNS pcns_dashboard.json pcns_alerts.json
CHESS chess_dashboard.json chess_alerts.json
DGRSC dgrsc_dashboard.json dgrsc_alerts.json
CAVMS cavms_dashboard.json cavms_alerts.json
Clone this wiki locally