Skip to content

Commit

Permalink
feat: add CloudWatch alarms and alarms forwarding to BetterStack (#131)
Browse files Browse the repository at this point in the history
* feat: add CloudWatch alarms and alarms forwarding to BetterStack

* fix: ignore topic encryption warning
  • Loading branch information
xav authored Sep 14, 2023
1 parent 08136b5 commit b87566b
Show file tree
Hide file tree
Showing 27 changed files with 653 additions and 256 deletions.
2 changes: 2 additions & 0 deletions .github/codeowners
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
* @Xav
* @Elyniss
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ repos:
# - id: terraform_tfsec
- id: terraform_docs
args:
- '--args=--lockfile=false'
- --args=--config=./terraform/.terraform-docs.yml

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
Expand Down
40 changes: 40 additions & 0 deletions terraform/.terraform-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
formatter: 'markdown table'

recursive:
enabled: true
path: .

output:
file: README.md
mode: inject
template: |-
<!-- BEGIN_TF_DOCS -->
{{ .Content }}
<!-- END_TF_DOCS -->
content: |
{{ .Header }}
{{ .Requirements }}
{{ .Providers }}
{{ .Modules }}
## Inputs
{{- $hideInputs := list "namespace" "region" "stage" "name" "delimiter" "attributes" "tags" "regex_replace_chars" "id_length_limit" "label_key_case" "label_value_case" "label_order" }}
{{- $filteredInputs := list -}}
{{- range .Module.Inputs -}}
{{- if not (has .Name $hideInputs) -}}
{{- $filteredInputs = append $filteredInputs . -}}
{{- end -}}
{{- end -}}
{{ if not $filteredInputs }}
No inputs.
{{ else }}
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
{{- range $filteredInputs }}
| {{ anchorNameMarkdown "input" .Name }} | {{ tostring .Description | sanitizeMarkdownTbl }} | {{ printf " " }}<pre lang="json">{{ tostring .Type | sanitizeMarkdownTbl }}</pre> | {{ printf " " }}<pre lang="json">{{ .GetValue | sanitizeMarkdownTbl }}</pre> | {{ printf " " }}{{ ternary .Required "yes" "no" }} |
{{- end }}
{{- end }}
{{ .Outputs }}
{{/** End of file fixer */}}
45 changes: 17 additions & 28 deletions terraform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ Now you can apply the changes:

`terraform -chdir=terraform apply -var-file="vars/dev.tfvars"`

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
<!-- BEGIN_TF_DOCS -->

## Requirements

| Name | Version |
Expand All @@ -21,19 +22,18 @@ Now you can apply the changes:
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 5.7 |
| <a name="requirement_grafana"></a> [grafana](#requirement\_grafana) | >= 2.1 |
| <a name="requirement_random"></a> [random](#requirement\_random) | 3.5.1 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | >= 5.7 |
| <a name="provider_aws"></a> [aws](#provider\_aws) | 5.12.0 |
| <a name="provider_random"></a> [random](#provider\_random) | 3.5.1 |
| <a name="provider_terraform"></a> [terraform](#provider\_terraform) | n/a |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_cloudwatch"></a> [cloudwatch](#module\_cloudwatch) | ./cloudwatch | n/a |
| <a name="module_dns_certificate"></a> [dns\_certificate](#module\_dns\_certificate) | app.terraform.io/wallet-connect/dns/aws | 0.1.3 |
| <a name="module_ecs"></a> [ecs](#module\_ecs) | ./ecs | n/a |
| <a name="module_keystore"></a> [keystore](#module\_keystore) | ./docdb | n/a |
Expand All @@ -43,33 +43,22 @@ Now you can apply the changes:
| <a name="module_vpc_endpoints"></a> [vpc\_endpoints](#module\_vpc\_endpoints) | terraform-aws-modules/vpc/aws//modules/vpc-endpoints | 5.1 |
| <a name="module_vpc_flow_s3_bucket"></a> [vpc\_flow\_s3\_bucket](#module\_vpc\_flow\_s3\_bucket) | terraform-aws-modules/s3-bucket/aws | ~> 3.14 |

## Resources

| Name | Type |
|------|------|
| [aws_prometheus_workspace.prometheus](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_workspace) | resource |
| [random_pet.this](https://registry.terraform.io/providers/hashicorp/random/3.5.1/docs/resources/pet) | resource |
| [aws_availability_zones.available](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/availability_zones) | data source |
| [aws_iam_policy_document.vpc_flow_log_s3](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [terraform_remote_state.dns](https://registry.terraform.io/providers/hashicorp/terraform/latest/docs/data-sources/remote_state) | data source |
| [terraform_remote_state.monitoring](https://registry.terraform.io/providers/hashicorp/terraform/latest/docs/data-sources/remote_state) | data source |
| [terraform_remote_state.org](https://registry.terraform.io/providers/hashicorp/terraform/latest/docs/data-sources/remote_state) | data source |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_grafana_auth"></a> [grafana\_auth](#input\_grafana\_auth) | The API Token for the Grafana instance | `string` | `""` | no |
| <a name="input_image_version"></a> [image\_version](#input\_image\_version) | The version of the image to deploy | `string` | n/a | yes |
| <a name="input_keystore_primary_instance_class"></a> [keystore\_primary\_instance\_class](#input\_keystore\_primary\_instance\_class) | The instance class of the primary docdb instances | `string` | n/a | yes |
| <a name="input_keystore_primary_instance_count"></a> [keystore\_primary\_instance\_count](#input\_keystore\_primary\_instance\_count) | The number of primary docdb instances to deploy | `number` | n/a | yes |
| <a name="input_keystore_replica_instance_class"></a> [keystore\_replica\_instance\_class](#input\_keystore\_replica\_instance\_class) | The instance class of the replica docdb instances | `string` | n/a | yes |
| <a name="input_keystore_replica_instance_count"></a> [keystore\_replica\_instance\_count](#input\_keystore\_replica\_instance\_count) | The number of replica docdb instances to deploy | `number` | n/a | yes |
| <a name="input_name"></a> [name](#input\_name) | The name of the application | `string` | `"keyserver"` | no |
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | The notification channels to send alerts to | `list(any)` | `[]` | no |
| <a name="input_region"></a> [region](#input\_region) | AWS region to deploy to | `string` | n/a | yes |

| <a name="input_betterstack_cloudwatch_webhook"></a> [betterstack\_cloudwatch\_webhook](#input\_betterstack\_cloudwatch\_webhook) | The BetterStack webhook to send CloudWatch alerts to | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_betterstack_prometheus_webhook"></a> [betterstack\_prometheus\_webhook](#input\_betterstack\_prometheus\_webhook) | The BetterStack webhook to send Prometheus alerts to | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_grafana_auth"></a> [grafana\_auth](#input\_grafana\_auth) | The API Token for the Grafana instance | <pre lang="json">string</pre> | <pre lang="json">""</pre> | no |
| <a name="input_image_version"></a> [image\_version](#input\_image\_version) | The version of the image to deploy | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_keystore_primary_instance_class"></a> [keystore\_primary\_instance\_class](#input\_keystore\_primary\_instance\_class) | The instance class of the primary docdb instances | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_keystore_primary_instance_count"></a> [keystore\_primary\_instance\_count](#input\_keystore\_primary\_instance\_count) | The number of primary docdb instances to deploy | <pre lang="json">number</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_keystore_replica_instance_class"></a> [keystore\_replica\_instance\_class](#input\_keystore\_replica\_instance\_class) | The instance class of the replica docdb instances | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_keystore_replica_instance_count"></a> [keystore\_replica\_instance\_count](#input\_keystore\_replica\_instance\_count) | The number of replica docdb instances to deploy | <pre lang="json">number</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_log_level"></a> [log\_level](#input\_log\_level) | Defines logging level for the application | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | The notification channels to send alerts to | <pre lang="json">list(any)</pre> | <pre lang="json">[]</pre> | no |
## Outputs

No outputs.
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->


<!-- END_TF_DOCS -->
43 changes: 43 additions & 0 deletions terraform/cloudwatch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# `cloudwatch` module

This module configures the cloudwatch alarms and webhook forwarding.

<!-- BEGIN_TF_DOCS -->

## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | ~> 1.0 |
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | ~> 5.7 |
## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | ~> 5.7 |
## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_cloudwatch"></a> [cloudwatch](#module\_cloudwatch) | app.terraform.io/wallet-connect/cloudwatch-constants/aws | 1.0.0 |
| <a name="module_this"></a> [this](#module\_this) | app.terraform.io/wallet-connect/label/null | 0.3.2 |

## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_context"></a> [context](#input\_context) | Single object for setting entire context at once.<br>See description of individual variables for details.<br>Leave string and numeric variables as `null` to use default value.<br>Individual variable settings (non-null) override settings in context object,<br>except for attributes and tags, which are merged. | <pre lang="json">any</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_docdb_cluster_id"></a> [docdb\_cluster\_id](#input\_docdb\_cluster\_id) | The DocumentDB cluster ID | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_docdb_cpu_threshold"></a> [docdb\_cpu\_threshold](#input\_docdb\_cpu\_threshold) | The DocumentDB CPU utilization alarm threshold in percents | <pre lang="json">number</pre> | <pre lang="json">80</pre> | no |
| <a name="input_docdb_low_memory_throttling_threshold"></a> [docdb\_low\_memory\_throttling\_threshold](#input\_docdb\_low\_memory\_throttling\_threshold) | The DocumentDB low memory throttling alarm threshold in number of operations per period | <pre lang="json">number</pre> | <pre lang="json">2</pre> | no |
| <a name="input_docdb_memory_threshold"></a> [docdb\_memory\_threshold](#input\_docdb\_memory\_threshold) | The DocumentDB available memory alarm threshold in GiB | <pre lang="json">number</pre> | <pre lang="json">4</pre> | no |
| <a name="input_ecs_cluster_name"></a> [ecs\_cluster\_name](#input\_ecs\_cluster\_name) | The name of the ECS cluster running the application | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_ecs_cpu_threshold"></a> [ecs\_cpu\_threshold](#input\_ecs\_cpu\_threshold) | The ECS CPU utilization alarm threshold in percents | <pre lang="json">number</pre> | <pre lang="json">80</pre> | no |
| <a name="input_ecs_memory_threshold"></a> [ecs\_memory\_threshold](#input\_ecs\_memory\_threshold) | The ECS memory utilization alarm threshold in percents | <pre lang="json">number</pre> | <pre lang="json">80</pre> | no |
| <a name="input_ecs_service_name"></a> [ecs\_service\_name](#input\_ecs\_service\_name) | The name of the ECS service running the application | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_webhook_url"></a> [webhook\_url](#input\_webhook\_url) | The URL of the webhook to be called on alarms | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
## Outputs

No outputs.


<!-- END_TF_DOCS -->
65 changes: 65 additions & 0 deletions terraform/cloudwatch/alarms_docdb.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
resource "aws_cloudwatch_metric_alarm" "docdb_cpu_utilization" {
alarm_name = "${local.alarm_prefix} - DocumentDB CPU Utilization"
alarm_description = "${local.alarm_prefix} - DocumentDB CPU utilization is high (over ${var.docdb_cpu_threshold}%)"

namespace = module.cloudwatch.namespaces.DocumentDB
dimensions = {
DBClusterIdentifier = var.docdb_cluster_id
}
metric_name = module.cloudwatch.metrics.DocumentDB.CPUUtilization

evaluation_periods = local.evaluation_periods
period = local.period

statistic = module.cloudwatch.statistics.Average
comparison_operator = module.cloudwatch.operators.GreaterThanOrEqualToThreshold
threshold = var.docdb_cpu_threshold
treat_missing_data = "breaching"

alarm_actions = [aws_sns_topic.webhook.arn]
insufficient_data_actions = [aws_sns_topic.webhook.arn]
}

resource "aws_cloudwatch_metric_alarm" "docdb_available_memory" {
alarm_name = "${local.alarm_prefix} - DocumentDB Available Memory"
alarm_description = "${local.alarm_prefix} - DocumentDB available memory is low (less than ${var.docdb_memory_threshold}GiB)"

namespace = module.cloudwatch.namespaces.DocumentDB
dimensions = {
DBClusterIdentifier = var.docdb_cluster_id
}
metric_name = module.cloudwatch.metrics.DocumentDB.FreeableMemory

evaluation_periods = local.evaluation_periods
period = local.period

statistic = module.cloudwatch.statistics.Average
comparison_operator = module.cloudwatch.operators.LessThanOrEqualToThreshold
threshold = var.docdb_memory_threshold * pow(1000, 3)
treat_missing_data = "breaching"

alarm_actions = [aws_sns_topic.webhook.arn]
insufficient_data_actions = [aws_sns_topic.webhook.arn]
}

resource "aws_cloudwatch_metric_alarm" "docdb_low_memory_throttling" {
alarm_name = "${local.alarm_prefix} - DocumentDB Low Memory Throttling"
alarm_description = "${local.alarm_prefix} - DocumentDB is throttling operations due to low memory"

namespace = module.cloudwatch.namespaces.DocumentDB
dimensions = {
DBClusterIdentifier = var.docdb_cluster_id
}
metric_name = module.cloudwatch.metrics.DocumentDB.LowMemNumOperationsThrottled

evaluation_periods = local.evaluation_periods
period = local.period

statistic = module.cloudwatch.statistics.Maximum
comparison_operator = module.cloudwatch.operators.GreaterThanThreshold
threshold = var.docdb_low_memory_throttling_threshold
treat_missing_data = "breaching"

alarm_actions = [aws_sns_topic.webhook.arn]
insufficient_data_actions = [aws_sns_topic.webhook.arn]
}
45 changes: 45 additions & 0 deletions terraform/cloudwatch/alarms_ecs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
resource "aws_cloudwatch_metric_alarm" "ecs_cpu_utilization" {
alarm_name = "${local.alarm_prefix} - ECS CPU Utilization"
alarm_description = "${local.alarm_prefix} - ECS CPU utilization is high (over ${var.ecs_cpu_threshold}%)"

namespace = module.cloudwatch.namespaces.ECS
dimensions = {
ClusterName = var.ecs_cluster_name
ServiceName = var.ecs_service_name
}
metric_name = module.cloudwatch.metrics.ECS.CPUUtilization

evaluation_periods = local.evaluation_periods
period = local.period

statistic = module.cloudwatch.statistics.Average
comparison_operator = module.cloudwatch.operators.GreaterThanOrEqualToThreshold
threshold = var.ecs_cpu_threshold
treat_missing_data = "breaching"

alarm_actions = [aws_sns_topic.webhook.arn]
insufficient_data_actions = [aws_sns_topic.webhook.arn]
}

resource "aws_cloudwatch_metric_alarm" "ecs_mem_utilization" {
alarm_name = "${local.alarm_prefix} - ECS Memory Utilization"
alarm_description = "${local.alarm_prefix} - ECS Memory utilization is high (over ${var.ecs_memory_threshold}%)"

namespace = module.cloudwatch.namespaces.ECS
dimensions = {
ClusterName = var.ecs_cluster_name
ServiceName = var.ecs_service_name
}
metric_name = module.cloudwatch.metrics.ECS.MemoryUtilization

evaluation_periods = local.evaluation_periods
period = local.period

statistic = module.cloudwatch.statistics.Average
comparison_operator = module.cloudwatch.operators.GreaterThanOrEqualToThreshold
threshold = var.ecs_memory_threshold
treat_missing_data = "breaching"

alarm_actions = [aws_sns_topic.webhook.arn]
insufficient_data_actions = [aws_sns_topic.webhook.arn]
}
Loading

0 comments on commit b87566b

Please sign in to comment.