Skip to content

Alert Response Guide

Brandon Cruz edited this page Dec 8, 2023 · 2 revisions

Alert Response Guide

Generated for BFD-2955 Alerts Audit The following lists include all alerts created for the Production Environment.

AWS Cloudwatch Alerts

The following are configured in AWS Cloudwatch and have various Alarm Actions. Highlighted rows are alerts that notify #bfd-alerts slack channel.

index alarmName alarmArn alarmDescription alarmConfigurationUpdatedTimestamp okActions alarmActions stateValue
0 bfd-pipeline-prod-load-exceeds-9am-est arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-pipeline-prod-load-exceeds-9am-est BFD Pipeline in prod environment failed to load data prior to this Monday 9 AM EST/EDT 2023-11-06 21:42:28.778000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-test'] OK
1 bfd-pipeline-prod-slo-ingestion-time-alert arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-pipeline-prod-slo-ingestion-time-alert BFD Pipeline in prod environment failed to load data within a 36 hour period 2023-10-04 17:05:53.155000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-test'] OK
2 bfd-pipeline-prod-slo-ingestion-time-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-pipeline-prod-slo-ingestion-time-warning BFD Pipeline in prod environment failed to load data within a 24 hour period 2023-10-04 17:05:53.157000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-test'] OK
3 bfd-prod-fhir-healthy-hosts arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-prod-fhir-healthy-hosts No healthy hosts available for bfd-prod-fhir in APP-ENV: bfd-prod 2019-09-03 13:14:59.611000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [===='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'====] OK
4 bfd-prod-pipeline-log-availability-1hr arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-prod-pipeline-log-availability-1hr Pipeline logs have not been submitted to CloudWatch in 1 hour, pipeline has likely shutdown in APP-ENV: bfd-prod 2022-12-08 19:15:21.528000+00:00 [] [===='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'====] OK
5 bfd-prod-pipeline-max-fiss-claim-latency-exceeded arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-prod-pipeline-max-fiss-claim-latency-exceeded fiss claim processing is falling behind (max latency exceeded) in APP-ENV: bfd-prod 2022-12-08 19:15:21.583000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-notices'] OK
6 bfd-prod-pipeline-max-mcs-claim-latency-exceeded arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-prod-pipeline-max-mcs-claim-latency-exceeded mcs claim processing is falling behind (max latency exceeded) in APP-ENV: bfd-prod 2022-12-08 19:15:21.537000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-notices'] OK
7 bfd-prod-pipeline-messages-datasetfailed arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-prod-pipeline-messages-datasetfailed Data set processing failed, pipeline has shut down in APP-ENV: bfd-prod 2022-12-17 13:38:40.811000+00:00 [] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
8 bfd-prod-pipeline-messages-error arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-prod-pipeline-messages-error Pipeline errors detected over 10 evaluation periods of 300 seconds in APP-ENV: bfd-prod 2022-12-17 14:03:17.479000+00:00 [] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
9 bfd-prod-server-log-availability-1hr arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-prod-server-log-availability-1hr BFD Server logs have not been submitted to CloudWatch in 1 hour, server has likely shutdown in APP-ENV: bfd-prod 2022-12-08 19:17:06.478000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
10 bfd-prod-server-query-logging-listener-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-prod-server-query-logging-listener-warning BFD Server QueryLoggingListener has encountered an unknown query in APP-ENV: bfd-prod 2023-02-24 19:04:05.287000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
11 bfd-server-prod-slo-availability-failures-sum-5m-alert arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-availability-failures-sum-5m-alert The sum of failed availability checks exceeded or was equal to ALERT SLO threshold of 3 failures in 5 minute(s) for bfd-server in prod environment. 2023-02-24 19:04:06.648000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
12 bfd-server-prod-slo-availability-failures-sum-5m-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-availability-failures-sum-5m-warning The sum of failed availability checks exceeded or was equal to WARNING SLO threshold of 1 failures in 5 minute(s) for bfd-server in prod environment. 2023-02-24 19:04:05.772000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
13 bfd-server-prod-slo-availability-uptime-percent-24hr-alert arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-availability-uptime-percent-24hr-alert The alarm transitioned to the ALARM state due to one of the following occurring: Percent uptime over 1 day(s) dropped below ALERT SLO threshold of 99% for bfd-server in prod environment. No data was reported by the availability checker Jenkins pipeline; the pipeline may have stopped running 2023-02-24 19:04:06.579000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
14 bfd-server-prod-slo-availability-uptime-percent-24hr-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-availability-uptime-percent-24hr-warning The alarm transitioned to the ALARM state due to one of the following occurring: Percent uptime over 1 day(s) dropped below WARNING SLO threshold of 99.8% for bfd-server in prod environment. No data was reported by the availability checker Jenkins pipeline; the pipeline may have stopped running 2023-02-24 19:04:19.645000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
15 bfd-server-prod-slo-claim-no-resources-latency-mean-15m-alert arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-claim-no-resources-latency-mean-15m-alert /v*/fhir/Claim response with no resources returned mean 15 minute latency exceeded ALERT SLO threshold of 700 ms for bfd-server in prod environment. 2023-09-20 16:09:38.697000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
16 bfd-server-prod-slo-claim-no-resources-latency-mean-15m-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-claim-no-resources-latency-mean-15m-warning /v*/fhir/Claim response with no resources returned mean 15 minute latency exceeded WARNING SLO threshold of 600 ms for bfd-server in prod environment. 2023-09-20 16:09:38.701000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
17 bfd-server-prod-slo-claimresponse-no-resources-latency-mean-15m-alert arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-claimresponse-no-resources-latency-mean-15m-alert /v*/fhir/ClaimResponse response with no resources returned mean 15 minute latency exceeded ALERT SLO threshold of 1100 ms for bfd-server in prod environment. 2023-09-20 16:09:38.783000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
18 bfd-server-prod-slo-claimresponse-no-resources-latency-mean-15m-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-claimresponse-no-resources-latency-mean-15m-warning /v*/fhir/ClaimResponse response with no resources returned mean 15 minute latency exceeded WARNING SLO threshold of 1000 ms for bfd-server in prod environment. 2023-09-20 16:09:38.779000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
19 bfd-server-prod-slo-coverage-bulk-latency-99p-15m-alert-bcda arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-coverage-bulk-latency-99p-15m-alert-bcda /v*/fhir/Coverage response 99% 15 minute BULK latency exceeded ALERT SLO threshold of 22500 ms for partner bcda for bfd-server in prod environment. 2023-02-24 19:04:05.369000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
20 bfd-server-prod-slo-coverage-bulk-latency-99p-15m-alert-dpc arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-coverage-bulk-latency-99p-15m-alert-dpc /v*/fhir/Coverage response 99% 15 minute BULK latency exceeded ALERT SLO threshold of 15000 ms for partner dpc for bfd-server in prod environment. 2023-02-24 19:04:06.707000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
21 bfd-server-prod-slo-coverage-bulk-latency-99p-15m-warning-bcda arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-coverage-bulk-latency-99p-15m-warning-bcda /v*/fhir/Coverage response 99% 15 minute BULK latency exceeded WARNING SLO threshold of 1440 ms for partner bcda for bfd-server in prod environment. 2023-02-24 19:04:06.441000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
22 bfd-server-prod-slo-coverage-bulk-latency-99p-15m-warning-dpc arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-coverage-bulk-latency-99p-15m-warning-dpc /v*/fhir/Coverage response 99% 15 minute BULK latency exceeded WARNING SLO threshold of 1440 ms for partner dpc for bfd-server in prod environment. 2023-02-24 19:04:05.953000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
23 bfd-server-prod-slo-coverage-latency-mean-15m-alert arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-coverage-latency-mean-15m-alert /v*/fhir/Coverage response mean 15 minute latency exceeded ALERT SLO threshold of 260ms for bfd-server in prod environment. 2023-02-24 19:04:05.721000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
24 bfd-server-prod-slo-coverage-latency-mean-15m-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-coverage-latency-mean-15m-warning /v*/fhir/Coverage response mean 15 minute latency exceeded WARNING SLO threshold of 180ms for bfd-server in prod environment. 2023-02-24 19:04:05.746000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
25 bfd-server-prod-slo-coverage-nonbulk-latency-99p-15m-alert-bb arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-coverage-nonbulk-latency-99p-15m-alert-bb /v*/fhir/Coverage response 99% 15 minute NON-BULK latency exceeded ALERT SLO threshold of 2050 ms for partner bb for bfd-server in prod environment. 2023-02-24 19:04:06.444000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
26 bfd-server-prod-slo-coverage-nonbulk-latency-99p-15m-warning-bb arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-coverage-nonbulk-latency-99p-15m-warning-bb /v*/fhir/Coverage response 99% 15 minute NON-BULK latency exceeded WARNING SLO threshold of 1440 ms for partner bb for bfd-server in prod environment. 2023-02-24 19:04:06.395000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
27 bfd-server-prod-slo-eob-no-resources-latency-mean-15m-alert arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-no-resources-latency-mean-15m-alert /v*/fhir/ExplanationOfBenefit response with no resources returned mean 15 minute latency exceeded ALERT SLO threshold of 440 ms for bfd-server in prod environment. 2023-02-24 19:04:06.683000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
28 bfd-server-prod-slo-eob-no-resources-latency-mean-15m-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-no-resources-latency-mean-15m-warning /v*/fhir/ExplanationOfBenefit response with no resources returned mean 15 minute latency exceeded WARNING SLO threshold of 310 ms for bfd-server in prod environment. 2023-02-24 19:04:05.475000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
29 bfd-server-prod-slo-eob-with-resources-bulk-latency-99p-15m-alert-ab2d arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-with-resources-bulk-latency-99p-15m-alert-ab2d /v*/fhir/ExplanationOfBenefit responses with resources returned 99% 15 minute BULK latency exceeded ALERT SLO threshold of 150000 ms for partner ab2d for bfd-server in prod environment. 2023-02-24 19:04:05.944000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
30 bfd-server-prod-slo-eob-with-resources-bulk-latency-99p-15m-alert-bcda arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-with-resources-bulk-latency-99p-15m-alert-bcda /v*/fhir/ExplanationOfBenefit responses with resources returned 99% 15 minute BULK latency exceeded ALERT SLO threshold of 22500 ms for partner bcda for bfd-server in prod environment. 2023-02-24 19:04:06.223000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
31 bfd-server-prod-slo-eob-with-resources-bulk-latency-99p-15m-alert-dpc arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-with-resources-bulk-latency-99p-15m-alert-dpc /v*/fhir/ExplanationOfBenefit responses with resources returned 99% 15 minute BULK latency exceeded ALERT SLO threshold of 15000 ms for partner dpc for bfd-server in prod environment. 2023-02-24 19:04:05.992000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
32 bfd-server-prod-slo-eob-with-resources-bulk-latency-per-kb-99p-15m-warning-ab2d arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-with-resources-bulk-latency-per-kb-99p-15m-warning-ab2d /v*/fhir/ExplanationOfBenefit responses with resources returned 99% 15 minute BULK latency per KB exceeded WARNING SLO threshold of 480 ms/KB for partner ab2d for bfd-server in prod environment. 2023-02-24 19:04:05.780000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
33 bfd-server-prod-slo-eob-with-resources-bulk-latency-per-kb-99p-15m-warning-bcda arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-with-resources-bulk-latency-per-kb-99p-15m-warning-bcda /v*/fhir/ExplanationOfBenefit responses with resources returned 99% 15 minute BULK latency per KB exceeded WARNING SLO threshold of 480 ms/KB for partner bcda for bfd-server in prod environment. 2023-02-24 19:04:16.244000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
34 bfd-server-prod-slo-eob-with-resources-bulk-latency-per-kb-99p-15m-warning-dpc arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-with-resources-bulk-latency-per-kb-99p-15m-warning-dpc /v*/fhir/ExplanationOfBenefit responses with resources returned 99% 15 minute BULK latency per KB exceeded WARNING SLO threshold of 480 ms/KB for partner dpc for bfd-server in prod environment. 2023-02-24 19:04:06.267000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
35 bfd-server-prod-slo-eob-with-resources-latency-per-kb-mean-15m-alert arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-with-resources-latency-per-kb-mean-15m-alert /v*/fhir/ExplanationOfBenefit response with resources returned mean 15 minute latency per KB exceeded ALERT SLO threshold of 450 ms/KB for bfd-server in prod environment. 2023-02-24 19:04:05.468000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
36 bfd-server-prod-slo-eob-with-resources-latency-per-kb-mean-15m-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-with-resources-latency-per-kb-mean-15m-warning /v*/fhir/ExplanationOfBenefit response with resources returned mean 15 minute latency per KB exceeded WARNING SLO threshold of 320 ms/KB for bfd-server in prod environment. 2023-02-24 19:04:06.126000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
37 bfd-server-prod-slo-eob-with-resources-nonbulk-latency-per-kb-99p-15m-alert-bb arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-with-resources-nonbulk-latency-per-kb-99p-15m-alert-bb /v*/fhir/ExplanationOfBenefit responses with resources returned 99% 15 minute NON-BULK latency per KB exceeded ALERT SLO threshold of 690 ms for partner bb for bfd-server in prod environment. 2023-02-24 19:04:17.223000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
38 bfd-server-prod-slo-eob-with-resources-nonbulk-latency-per-kb-99p-15m-warning-bb arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-eob-with-resources-nonbulk-latency-per-kb-99p-15m-warning-bb /v*/fhir/ExplanationOfBenefit responses with resources returned 99% 15 minute NON-BULK latency per KB exceeded WARNING SLO threshold of 480 ms for partner bb for bfd-server in prod environment. 2023-02-24 19:04:06.374000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
39 bfd-server-prod-slo-http500-percent-1hr-alert arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-http500-percent-1hr-alert Percent HTTP 500 (error) responses over 1 hour(s) exceeded ALERT SLO threshold of 10% for bfd-server in prod environment. 2023-03-03 18:05:56.189000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
40 bfd-server-prod-slo-http500-percent-1hr-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-http500-percent-1hr-warning Percent HTTP 500 (error) responses over 1 hour(s) exceeded WARNING SLO threshold of 1% for bfd-server in prod environment. 2023-03-03 18:05:56.251000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
41 bfd-server-prod-slo-http500-percent-24hr-alert arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-http500-percent-24hr-alert Percent HTTP 500 (error) responses over 24 hour(s) exceeded ALERT SLO threshold of 0.01% for bfd-server in prod environment. 2023-08-01 19:27:33.529000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
42 bfd-server-prod-slo-http500-percent-24hr-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-http500-percent-24hr-warning Percent HTTP 500 (error) responses over 24 hour(s) exceeded WARNING SLO threshold of 0.001% for bfd-server in prod environment. 2023-02-24 19:04:06.323000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
43 bfd-server-prod-slo-patient-by-contract-count-4000-latency-99p-15m-alert-ab2d arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-patient-by-contract-count-4000-latency-99p-15m-alert-ab2d /v*/fhir/Patient (by contract, count 4000) response 99% 15 minute latency exceeded ALERT SLO threshold of 150000 ms for partner ab2d for bfd-server in prod environment. 2023-12-05 22:16:15.534000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
44 bfd-server-prod-slo-patient-by-contract-count-4000-latency-99p-15m-warning-ab2d arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-patient-by-contract-count-4000-latency-99p-15m-warning-ab2d /v*/fhir/Patient (by contract, count 4000) response 99% 15 minute latency exceeded WARNING SLO threshold of 44 seconds for partner ab2d for bfd-server in prod environment. 2023-12-05 22:16:15.567000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
45 bfd-server-prod-slo-patient-by-contract-count-4000-latency-mean-15m-alert arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-patient-by-contract-count-4000-latency-mean-15m-alert /v*/fhir/Patient (by contract, count 4000) response mean 15 minute latency exceeded ALERT SLO threshold of 40 seconds for bfd-server in prod environment. 2023-02-24 19:04:06.199000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
46 bfd-server-prod-slo-patient-by-contract-count-4000-latency-mean-15m-warning arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-patient-by-contract-count-4000-latency-mean-15m-warning /v*/fhir/Patient (by contract, count 4000) response mean 15 minute latency exceeded WARNING SLO threshold of 40 seconds for bfd-server in prod environment. 2023-02-24 19:04:05.828000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK
47 bfd-server-prod-slo-patient-no-contract-bulk-latency-99p-15m-alert-bcda arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-patient-no-contract-bulk-latency-99p-15m-alert-bcda /v*/fhir/Patient (not by contract) response 99% 15 minute BULK latency exceeded ALERT SLO threshold of 22500 ms for partner bcda for bfd-server in prod environment. 2023-02-24 19:04:05.567000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
48 bfd-server-prod-slo-patient-no-contract-bulk-latency-99p-15m-alert-dpc arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-patient-no-contract-bulk-latency-99p-15m-alert-dpc /v*/fhir/Patient (not by contract) response 99% 15 minute BULK latency exceeded ALERT SLO threshold of 15000 ms for partner dpc for bfd-server in prod environment. 2023-02-24 19:04:19.114000+00:00 ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-ok'] [=='arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms'==] OK
49 bfd-server-prod-slo-patient-no-contract-bulk-latency-99p-15m-warning-bcda arn:aws:cloudwatch:us-east-1:577373831711:alarm:bfd-server-prod-slo-patient-no-contract-bulk-latency-99p-15m-warning-bcda /v*/fhir/Patient (not by contract) response 99% 15 minute BULK latency exceeded WARNING SLO threshold of 585 ms for partner bcda for bfd-server in prod environment. 2023-02-24 19:04:05.406000+00:00 [] ['arn:aws:sns:us-east-1:577373831711:bfd-prod-cloudwatch-alarms-slack-bfd-warnings'] OK

Splunk Alerts

The following are configured in Splunk's BFD App.

Name Description Alert Type Trigger Cause Action Status
BFD ETL - Data load longer than 48 hours This alert runs every Monday morning and will alert if a data load has taken 48 hours or more to process. Scheduled. Weekly, Monday at 10:00 AM ET Data Load has taken more than 48 hours to process CCW File Delay or Pipeline Failed to ingest Check S3 Buckets if data was ingested and Check to see if pipeline instance has issues Disabled
BFD ETL - No data sets found to process This alert runs every Monday morning at 6am (Eastern) and alerts if there were no found data sets in the previous 2 days. E.g., no datasets were found over the weekend. Scheduled. Weekly, Monday at 10:00 AM ET No datasets were found over the weekend CCW File Delay or Pipeline Failed to ingest Check S3 Buckets if data was ingested and Check to see if pipeline instance has issues Enabled
BFD NO LOG INGESTION FOR BFD APP SERVER 1h This alert checks the source of bluebutton-server-app-log.json and alerts if this log has a count of 0 in last 1hr Scheduled. Hourly, at 0 minutes past the hour. Source bluebutton-server-app-log.json has a count of 0 in last 1hr Enabled
BFD NO LOG INGESTION FOR BFD WEB SERVER 1h This alert checks the source of bfd-server/access.json and alerts of this log as a count of 0 in the last 1hr Scheduled. Hourly, at 0 minutes past the hour. Source bfd-server/access.json has a count of 0 in last 1hr Enabled
BFD NO LOG INGESTION FOR ETL PIPELINE 1h This alert checks the source of bluebutton-data-pipeline.log and alerts of this log as a count of 0 in the last 1hr Scheduled. Hourly, at 0 minutes past the hour. Source bluebutton-data-pipeline.log has a count of 0 in last 1hr Enabled
BFD NO LOG INGESTION FOR HOST 1h This alert checks the source of /var/log/messages and alerts of this log as a count of 0 in the last 1h Scheduled. Hourly, at 0 minutes past the hour. Source /var/log/messages has a count of 0 in last 1hr Enabled
BFD-CCW-FOUND-DATASETS-PROD This alert checks the source of bluebutton-data-pipeline.log with message of Found data set to process IN(BENEFICIARY, CARRIER, DME, HHA, HOSPICE, INPATIENT, OUTPATIENT, PDE, SNF) Scheduled. */15 * * * * Source bluebutton-data-pipeline.log contains message of Found data set to process IN(BENEFICIARY, CARRIER, DME, HHA, HOSPICE, INPATIENT, OUTPATIENT, PDE, SNF) Enabled
BFD-CCW-JOB-FAILED-PROD This alert checks the source of bluebutton-data-pipeline.log with message %PipelineJobFailure% Scheduled. */15 * * * * Source bluebutton-data-pipeline.log contains message %PipelineJobFailure% Enabled
BFD-CCW-LOADED-DATASET-PROD This alert checks the source of bluebutton-data-pipeline.log with message of Processed type IN(BENEFICIARY, CARRIER, DME, HHA, HOSPICE, INPATIENT, OUTPATIENT, PDE, SNF) Scheduled. */15 * * * * Source bluebutton-data-pipeline.log contains message of Processed type IN(BENEFICIARY, CARRIER, DME, HHA, HOSPICE, INPATIENT, OUTPATIENT, PDE, SNF) Enabled

New Relic Alerts

The following are configured in New Relic but not configured for notifications. We may want to see if we want to adjust this to send notifications to #bfd-alerts slack channel.

Name Policy Trigger Cause Action Status
BFD Endpoints - Error Rate > 10% BB2 - BFD - Prod Enabled
BFD Endpoints - FHIR BB2 - BFD - Prod Disabled
BFD Endpoints - Health BB2 - BFD - Prod Disabled
BFD Endpoints - Latency 50th Percentile > 6 seconds BB2 - BFD - Prod Enabled
V1 - Anomaly Detected - API Response Time BFD - Anomaly Detected - PROD Enabled
V1 - Anomaly Detected - Error Percentage BFD - Anomaly Detected - PROD Enabled
V2 - Anomaly Detected - Avg Response Time BFD - Anomaly Detected - PROD Enabled
V2 - Anomaly Detected - Error Percentage BFD - Anomaly Detected - PROD Enabled
Error % BFD - V1 API - SLO Violations - PROD Enabled
V1 - Coverage (p95) BFD - V1 API - SLO Violations - PROD Enabled
V1 - EOB _since (p95) BFD - V1 API - SLO Violations - PROD Enabled
V1 - EOB (p95) BFD - V1 API - SLO Violations - PROD Enabled
V1 - Patient (p95) BFD - V1 API - SLO Violations - PROD Enabled
V1 - Patient - by Contract and YearMonth - 4000 (p95) BFD - V1 API - SLO Violations - PROD Enabled
Error % BFD - V2 API - SLO Violations - PROD Disabled
V2 - Coverage (p95) BFD - V2 API - SLO Violations - PROD Disabled
V2 - EOB _since (p95) BFD - V2 API - SLO Violations - PROD Disabled
V2 - EOB (p95) BFD - V2 API - SLO Violations - PROD Disabled
V2 - Patient (p95) BFD - V2 API - SLO Violations - PROD Disabled
V2 - Patient - by Contract and YearMonth - 4000 (p95) BFD - V2 API - SLO Violations - PROD Disabled

Alert Polices

Name # of Alert conditions Alert Conditions Notifications
BB2 - BFD - Prod 4 BFD Endpoints - Error Rate > 10% \nBFD Endpoints - FHIR \nBFD Endpoints - Health \nBFD Endpoints - Latency 50th Percentile > 6 seconds Disabled
BFD - Anomaly Detected - PROD 4 V1 - Anomaly Detected - API Response Time \nV1 - Anomaly Detected - Error Percentage \nV2 - Anomaly Detected - Avg Response Time \nV2 - Anomaly Detected - Error Percentage Disabled
BFD - V1 API - SLO Violations - PROD 6 Error % \nV1 - Coverage (p95) \nV1 - EOB _since (p95) \nV1 - EOB (p95) \nV1 - Patient (p95) \nV1 - Patient - by Contract and YearMonth - 4000 (p95) Disabled
BFD - V2 API - SLO Violations - PROD 6 Error % \nV2 - Coverage (p95) \nV2 - EOB _since (p95) \nV2 - EOB (p95) \nV2 - Patient (p95) \nV2 - Patient - By Contract and YearMonth - 4000 (p95) Disabled
Clone this wiki locally