Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add CloudWatch alarms and alarms forwarding to BetterStack #131

Merged
merged 3 commits into from
Sep 14, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/codeowners
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
* @Xav
* @Elyniss
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ repos:
# - id: terraform_tfsec
- id: terraform_docs
args:
- '--args=--lockfile=false'
- --args=--config=./terraform/.terraform-docs.yml

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
Expand Down
40 changes: 40 additions & 0 deletions terraform/.terraform-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
formatter: 'markdown table'

recursive:
enabled: true
path: .

output:
file: README.md
mode: inject
template: |-
<!-- BEGIN_TF_DOCS -->
{{ .Content }}
<!-- END_TF_DOCS -->

content: |
{{ .Header }}
{{ .Requirements }}
{{ .Providers }}
{{ .Modules }}

## Inputs
{{- $hideInputs := list "namespace" "region" "stage" "name" "delimiter" "attributes" "tags" "regex_replace_chars" "id_length_limit" "label_key_case" "label_value_case" "label_order" }}
{{- $filteredInputs := list -}}
{{- range .Module.Inputs -}}
{{- if not (has .Name $hideInputs) -}}
{{- $filteredInputs = append $filteredInputs . -}}
{{- end -}}
{{- end -}}
{{ if not $filteredInputs }}

No inputs.
{{ else }}
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
{{- range $filteredInputs }}
| {{ anchorNameMarkdown "input" .Name }} | {{ tostring .Description | sanitizeMarkdownTbl }} | {{ printf " " }}<pre lang="json">{{ tostring .Type | sanitizeMarkdownTbl }}</pre> | {{ printf " " }}<pre lang="json">{{ .GetValue | sanitizeMarkdownTbl }}</pre> | {{ printf " " }}{{ ternary .Required "yes" "no" }} |
{{- end }}
{{- end }}
{{ .Outputs }}
{{/** End of file fixer */}}
45 changes: 17 additions & 28 deletions terraform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ Now you can apply the changes:

`terraform -chdir=terraform apply -var-file="vars/dev.tfvars"`

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
<!-- BEGIN_TF_DOCS -->

## Requirements

| Name | Version |
Expand All @@ -21,19 +22,18 @@ Now you can apply the changes:
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 5.7 |
| <a name="requirement_grafana"></a> [grafana](#requirement\_grafana) | >= 2.1 |
| <a name="requirement_random"></a> [random](#requirement\_random) | 3.5.1 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | >= 5.7 |
| <a name="provider_aws"></a> [aws](#provider\_aws) | 5.12.0 |
| <a name="provider_random"></a> [random](#provider\_random) | 3.5.1 |
| <a name="provider_terraform"></a> [terraform](#provider\_terraform) | n/a |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_cloudwatch"></a> [cloudwatch](#module\_cloudwatch) | ./cloudwatch | n/a |
| <a name="module_dns_certificate"></a> [dns\_certificate](#module\_dns\_certificate) | app.terraform.io/wallet-connect/dns/aws | 0.1.3 |
| <a name="module_ecs"></a> [ecs](#module\_ecs) | ./ecs | n/a |
| <a name="module_keystore"></a> [keystore](#module\_keystore) | ./docdb | n/a |
Expand All @@ -43,33 +43,22 @@ Now you can apply the changes:
| <a name="module_vpc_endpoints"></a> [vpc\_endpoints](#module\_vpc\_endpoints) | terraform-aws-modules/vpc/aws//modules/vpc-endpoints | 5.1 |
| <a name="module_vpc_flow_s3_bucket"></a> [vpc\_flow\_s3\_bucket](#module\_vpc\_flow\_s3\_bucket) | terraform-aws-modules/s3-bucket/aws | ~> 3.14 |

## Resources

| Name | Type |
|------|------|
| [aws_prometheus_workspace.prometheus](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_workspace) | resource |
| [random_pet.this](https://registry.terraform.io/providers/hashicorp/random/3.5.1/docs/resources/pet) | resource |
| [aws_availability_zones.available](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/availability_zones) | data source |
| [aws_iam_policy_document.vpc_flow_log_s3](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [terraform_remote_state.dns](https://registry.terraform.io/providers/hashicorp/terraform/latest/docs/data-sources/remote_state) | data source |
| [terraform_remote_state.monitoring](https://registry.terraform.io/providers/hashicorp/terraform/latest/docs/data-sources/remote_state) | data source |
| [terraform_remote_state.org](https://registry.terraform.io/providers/hashicorp/terraform/latest/docs/data-sources/remote_state) | data source |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_grafana_auth"></a> [grafana\_auth](#input\_grafana\_auth) | The API Token for the Grafana instance | `string` | `""` | no |
| <a name="input_image_version"></a> [image\_version](#input\_image\_version) | The version of the image to deploy | `string` | n/a | yes |
| <a name="input_keystore_primary_instance_class"></a> [keystore\_primary\_instance\_class](#input\_keystore\_primary\_instance\_class) | The instance class of the primary docdb instances | `string` | n/a | yes |
| <a name="input_keystore_primary_instance_count"></a> [keystore\_primary\_instance\_count](#input\_keystore\_primary\_instance\_count) | The number of primary docdb instances to deploy | `number` | n/a | yes |
| <a name="input_keystore_replica_instance_class"></a> [keystore\_replica\_instance\_class](#input\_keystore\_replica\_instance\_class) | The instance class of the replica docdb instances | `string` | n/a | yes |
| <a name="input_keystore_replica_instance_count"></a> [keystore\_replica\_instance\_count](#input\_keystore\_replica\_instance\_count) | The number of replica docdb instances to deploy | `number` | n/a | yes |
| <a name="input_name"></a> [name](#input\_name) | The name of the application | `string` | `"keyserver"` | no |
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | The notification channels to send alerts to | `list(any)` | `[]` | no |
| <a name="input_region"></a> [region](#input\_region) | AWS region to deploy to | `string` | n/a | yes |

| <a name="input_betterstack_cloudwatch_webhook"></a> [betterstack\_cloudwatch\_webhook](#input\_betterstack\_cloudwatch\_webhook) | The BetterStack webhook to send CloudWatch alerts to | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_betterstack_prometheus_webhook"></a> [betterstack\_prometheus\_webhook](#input\_betterstack\_prometheus\_webhook) | The BetterStack webhook to send Prometheus alerts to | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_grafana_auth"></a> [grafana\_auth](#input\_grafana\_auth) | The API Token for the Grafana instance | <pre lang="json">string</pre> | <pre lang="json">""</pre> | no |
| <a name="input_image_version"></a> [image\_version](#input\_image\_version) | The version of the image to deploy | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_keystore_primary_instance_class"></a> [keystore\_primary\_instance\_class](#input\_keystore\_primary\_instance\_class) | The instance class of the primary docdb instances | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_keystore_primary_instance_count"></a> [keystore\_primary\_instance\_count](#input\_keystore\_primary\_instance\_count) | The number of primary docdb instances to deploy | <pre lang="json">number</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_keystore_replica_instance_class"></a> [keystore\_replica\_instance\_class](#input\_keystore\_replica\_instance\_class) | The instance class of the replica docdb instances | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_keystore_replica_instance_count"></a> [keystore\_replica\_instance\_count](#input\_keystore\_replica\_instance\_count) | The number of replica docdb instances to deploy | <pre lang="json">number</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_log_level"></a> [log\_level](#input\_log\_level) | Defines logging level for the application | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | The notification channels to send alerts to | <pre lang="json">list(any)</pre> | <pre lang="json">[]</pre> | no |
## Outputs

No outputs.
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->


<!-- END_TF_DOCS -->
43 changes: 43 additions & 0 deletions terraform/cloudwatch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# `cloudwatch` module

This module configures the cloudwatch alarms and webhook forwarding.

<!-- BEGIN_TF_DOCS -->

## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | ~> 1.0 |
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | ~> 5.7 |
## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | ~> 5.7 |
## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_cloudwatch"></a> [cloudwatch](#module\_cloudwatch) | app.terraform.io/wallet-connect/cloudwatch-constants/aws | 1.0.0 |
| <a name="module_this"></a> [this](#module\_this) | app.terraform.io/wallet-connect/label/null | 0.3.2 |

## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_context"></a> [context](#input\_context) | Single object for setting entire context at once.<br>See description of individual variables for details.<br>Leave string and numeric variables as `null` to use default value.<br>Individual variable settings (non-null) override settings in context object,<br>except for attributes and tags, which are merged. | <pre lang="json">any</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_docdb_cluster_id"></a> [docdb\_cluster\_id](#input\_docdb\_cluster\_id) | The DocumentDB cluster ID | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_docdb_cpu_threshold"></a> [docdb\_cpu\_threshold](#input\_docdb\_cpu\_threshold) | The DocumentDB CPU utilization alarm threshold in percents | <pre lang="json">number</pre> | <pre lang="json">80</pre> | no |
| <a name="input_docdb_low_memory_throttling_threshold"></a> [docdb\_low\_memory\_throttling\_threshold](#input\_docdb\_low\_memory\_throttling\_threshold) | The DocumentDB low memory throttling alarm threshold in number of operations per period | <pre lang="json">number</pre> | <pre lang="json">2</pre> | no |
| <a name="input_docdb_memory_threshold"></a> [docdb\_memory\_threshold](#input\_docdb\_memory\_threshold) | The DocumentDB available memory alarm threshold in GiB | <pre lang="json">number</pre> | <pre lang="json">4</pre> | no |
| <a name="input_ecs_cluster_name"></a> [ecs\_cluster\_name](#input\_ecs\_cluster\_name) | The name of the ECS cluster running the application | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_ecs_cpu_threshold"></a> [ecs\_cpu\_threshold](#input\_ecs\_cpu\_threshold) | The ECS CPU utilization alarm threshold in percents | <pre lang="json">number</pre> | <pre lang="json">80</pre> | no |
| <a name="input_ecs_memory_threshold"></a> [ecs\_memory\_threshold](#input\_ecs\_memory\_threshold) | The ECS memory utilization alarm threshold in percents | <pre lang="json">number</pre> | <pre lang="json">80</pre> | no |
| <a name="input_ecs_service_name"></a> [ecs\_service\_name](#input\_ecs\_service\_name) | The name of the ECS service running the application | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
| <a name="input_webhook_url"></a> [webhook\_url](#input\_webhook\_url) | The URL of the webhook to be called on alarms | <pre lang="json">string</pre> | <pre lang="json">n/a</pre> | yes |
## Outputs

No outputs.


<!-- END_TF_DOCS -->
65 changes: 65 additions & 0 deletions terraform/cloudwatch/alarms_docdb.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
resource "aws_cloudwatch_metric_alarm" "docdb_cpu_utilization" {
alarm_name = "${local.alarm_prefix} - DocumentDB CPU Utilization"
alarm_description = "${local.alarm_prefix} - DocumentDB CPU utilization is high (over ${var.docdb_cpu_threshold}%)"

namespace = module.cloudwatch.namespaces.DocumentDB
dimensions = {
DBClusterIdentifier = var.docdb_cluster_id
}
metric_name = module.cloudwatch.metrics.DocumentDB.CPUUtilization

evaluation_periods = local.evaluation_periods
period = local.period

statistic = module.cloudwatch.statistics.Average
comparison_operator = module.cloudwatch.operators.GreaterThanOrEqualToThreshold
threshold = var.docdb_cpu_threshold
treat_missing_data = "breaching"

alarm_actions = [aws_sns_topic.webhook.arn]
insufficient_data_actions = [aws_sns_topic.webhook.arn]
}

resource "aws_cloudwatch_metric_alarm" "docdb_available_memory" {
alarm_name = "${local.alarm_prefix} - DocumentDB Available Memory"
alarm_description = "${local.alarm_prefix} - DocumentDB available memory is low (less than ${var.docdb_memory_threshold}GiB)"

namespace = module.cloudwatch.namespaces.DocumentDB
dimensions = {
DBClusterIdentifier = var.docdb_cluster_id
}
metric_name = module.cloudwatch.metrics.DocumentDB.FreeableMemory

evaluation_periods = local.evaluation_periods
period = local.period

statistic = module.cloudwatch.statistics.Average
comparison_operator = module.cloudwatch.operators.LessThanOrEqualToThreshold
threshold = var.docdb_memory_threshold * pow(1000, 3)
treat_missing_data = "breaching"

alarm_actions = [aws_sns_topic.webhook.arn]
insufficient_data_actions = [aws_sns_topic.webhook.arn]
}

resource "aws_cloudwatch_metric_alarm" "docdb_low_memory_throttling" {
alarm_name = "${local.alarm_prefix} - DocumentDB Low Memory Throttling"
alarm_description = "${local.alarm_prefix} - DocumentDB is throttling operations due to low memory"

namespace = module.cloudwatch.namespaces.DocumentDB
dimensions = {
DBClusterIdentifier = var.docdb_cluster_id
}
metric_name = module.cloudwatch.metrics.DocumentDB.LowMemNumOperationsThrottled

evaluation_periods = local.evaluation_periods
period = local.period

statistic = module.cloudwatch.statistics.Maximum
comparison_operator = module.cloudwatch.operators.GreaterThanThreshold
threshold = var.docdb_low_memory_throttling_threshold
treat_missing_data = "breaching"

alarm_actions = [aws_sns_topic.webhook.arn]
insufficient_data_actions = [aws_sns_topic.webhook.arn]
}
45 changes: 45 additions & 0 deletions terraform/cloudwatch/alarms_ecs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
resource "aws_cloudwatch_metric_alarm" "ecs_cpu_utilization" {
alarm_name = "${local.alarm_prefix} - ECS CPU Utilization"
alarm_description = "${local.alarm_prefix} - ECS CPU utilization is high (over ${var.ecs_cpu_threshold}%)"

namespace = module.cloudwatch.namespaces.ECS
dimensions = {
ClusterName = var.ecs_cluster_name
ServiceName = var.ecs_service_name
}
metric_name = module.cloudwatch.metrics.ECS.CPUUtilization

evaluation_periods = local.evaluation_periods
period = local.period

statistic = module.cloudwatch.statistics.Average
comparison_operator = module.cloudwatch.operators.GreaterThanOrEqualToThreshold
threshold = var.ecs_cpu_threshold
treat_missing_data = "breaching"

alarm_actions = [aws_sns_topic.webhook.arn]
insufficient_data_actions = [aws_sns_topic.webhook.arn]
}

resource "aws_cloudwatch_metric_alarm" "ecs_mem_utilization" {
alarm_name = "${local.alarm_prefix} - ECS Memory Utilization"
alarm_description = "${local.alarm_prefix} - ECS Memory utilization is high (over ${var.ecs_memory_threshold}%)"

namespace = module.cloudwatch.namespaces.ECS
dimensions = {
ClusterName = var.ecs_cluster_name
ServiceName = var.ecs_service_name
}
metric_name = module.cloudwatch.metrics.ECS.MemoryUtilization

evaluation_periods = local.evaluation_periods
period = local.period

statistic = module.cloudwatch.statistics.Average
comparison_operator = module.cloudwatch.operators.GreaterThanOrEqualToThreshold
threshold = var.ecs_memory_threshold
treat_missing_data = "breaching"

alarm_actions = [aws_sns_topic.webhook.arn]
insufficient_data_actions = [aws_sns_topic.webhook.arn]
}
Loading