-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* set filterExternalLabels to false * comment remote read temporarily * enable remote read * # filterExternalLabels: false comment out * remove filter external label * Changes for ms * gitlab seed token * Refresh tempaltes * Change in gitlabci * eks netbird integration * managed_external_name_map * Enable Ceph debugging pod deployment * added cloudwatch exporter initial config * added dashboards for ec2 and rds * template data * make cloudwatch exporter config using yaml file * adding netbird inputs for eks * add exporter update labels remove tmp update paths added resources for exporter remove subpath * template path * tempalate * add credentials * add port in container definition * use named port in targetPort * add servce monitor * use 60sec interval for scrapping * add scrap_job label * change indentation for labels * remove cloud watch exporter * Eks oidc * fetch cloud watch credentails from vault * use vault secret store * fix secret store ref * fix the secret * fix relabel config * remove external secret * Revert "remove external secret" This reverts commit ba0ed04. * add aws ec2 dashboard * adding changes * adding the missing change * Removing unwanted file * oidc config key * add average NetworkIn metric * formatting * add sum to NetworkIn * add NetworkOut * commenting out * add disk i/o ops * added status check metrics * fix aws dashboards foldername * fix grafana dashboard name * netbird installation changes * wireguard healthcheck port * renamed ec2 and rds dashboards * added ebs dashboard * added rds and ebs dashboards * add exporterd tag on metrics * fix cloud watch exporter config * monitor another VM * remove exported tags * only pull InstanceId metric * add EBS metrics * include eks also for ms access * add nftables flag * add cloudwatch billing dashboard * added cloudwatch requests per min dashboards * added cloudwatch billing dashboard in git * revert nft env change * comment out duplicated dashboard * add back aws-cloudwatch-billing * use dynamic tag for aws dashboards * use dynamic tag for all mimir dashboards * update dashboard * try adding pre bootstrap with yum install * try configuring ipvs mode for kube-proxy add one * added more charts * revert kube-proxy ipvs change * change poll to 60m * added cloudwatch integration documentation * increasing the objects limit * fix * change in refresh templates * add cc cidr block for snat env var on eks vpc cni * missed group var * add cidr block var * add service account for cloudwatch exporter * add namespace to service account * rename cloudwatch exporter SA * add role annotation * remove hardcoded role * setting AWS_VPC_K8S_CNI_EXTERNALSNAT to true * backtunnel access * cni config * eks k8s version * netbird route * Add data source for cloud-init and bastion instances * add bastion launch template and asg resource * Output required bastion information in base-k8s module * define vars for bastion asg * Output required bastion information in base-k8s module * Output required bastion information in eks module * Output required bastion information in managed svs module * change in route * chnage in backtunnel route * remove route53 for bastion * netbird provider * netbird disable rotation * remove commented code not required * ami * change config for new dev tf version with rotation * add example policy for development * Revert "add example policy for development" This reverts commit 57b356a. * change in msgw route * correction * update cloudwatch architecture doc * added RDS configs * update region * removing poll 60m * update cloud-watch-integration-arch * use max for read and write latency * comment out EC2 metrics available via node-exporter * comment out cloudwatch exporter * comment out dashboards * pr corrections --------- Co-authored-by: muzammil360 <muzammil360@gmail.com> Co-authored-by: Josphat Mutai <josphatkmutai@gmail.com> Co-authored-by: David Fry <david.fry@modusbox.com>
- Loading branch information
1 parent
400517f
commit d59f9dc
Showing
55 changed files
with
3,407 additions
and
95 deletions.
There are no files selected for viewing
496 changes: 496 additions & 0 deletions
496
assets/grafana-dashboards/aws-cloudwatch/cloudwatch-billing.json
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Problem | ||
Some components of the Mojaloop software may operate as AWS managed services, which report their metrics to AWS CloudWatch. The operations team needs these performance and health metrics to be available in a centralized Grafana monitoring dashboard. | ||
|
||
# Solution | ||
The solution involves retrieving metrics from CloudWatch to evaluate the performance and health of AWS managed services, and integrating them into Prometheus. Grafana dashboards can then query these metrics from Prometheus. | ||
|
||
The diagram below depicts the architecture of the proposed system. | ||
|
||
![diagram](./cloudwatch-integraton-architecture.svg) | ||
|
||
# Implementation details | ||
|
||
## Exporter options | ||
Two options are available. | ||
1. [CloudWatch Exporter](https://github.com/prometheus/cloudwatch_exporter/) | ||
2. [YACE - yet another cloudwatch exporter](https://github.com/nerdswords/yet-another-cloudwatch-exporter) | ||
|
||
We chose the second option, YACE, because it includes a mixin with prebuilt dashboards for various services like EC2, EBS, S3, and RDS, which reduces the setup effort. | ||
|
||
## Authentication | ||
Cloudwatch exporter needs to authenticate with the AWS cloudwatch API. YACE uses AWS SDK for Go enabling us to authenticate via [AWS's default credential chain](https://aws.github.io/aws-sdk-go-v2/docs/configuring-sdk/#specifying-credentials). We have two relevant options | ||
|
||
|
||
1. Expose credentails as environment variables | ||
2. Associate an AWS IAM policy with the exporter pod | ||
|
||
Option 1 uses long-lived static credentials, while Option 2 enables short-lived, more secure authentication tokens. Currently, we are using Option 1 to accelerate development. | ||
|
||
## Target Discovery | ||
YACE can discover and filter resource targets based on tags. To maintain consistency, we should use the `monitoring_enabled:true` tag on all resources that need to be monitored. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
65 changes: 65 additions & 0 deletions
65
gitops/applications/base/monitoring-post-config/dashboards-aws-managed-svs.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# apiVersion: grafana.integreatly.org/v1beta1 | ||
# kind: GrafanaFolder | ||
# metadata: | ||
# name: aws-managed-services | ||
# spec: | ||
# instanceSelector: | ||
# matchLabels: | ||
# dashboards: "grafana" | ||
# --- | ||
# apiVersion: grafana.integreatly.org/v1beta1 | ||
# kind: GrafanaDashboard | ||
# metadata: | ||
# name: aws-ec2 | ||
# spec: | ||
# folder: aws-managed-services | ||
# datasources: | ||
# - inputName: "DS_PROMETHEUS" | ||
# datasourceName: "${ARGOCD_ENV_dashboard_datasource_name}" | ||
# instanceSelector: | ||
# matchLabels: | ||
# dashboards: "grafana" | ||
# url: https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/aws-ec2.json | ||
# --- | ||
# apiVersion: grafana.integreatly.org/v1beta1 | ||
# kind: GrafanaDashboard | ||
# metadata: | ||
# name: aws-rds | ||
# spec: | ||
# folder: aws-managed-services | ||
# datasources: | ||
# - inputName: "DS_PROMETHEUS" | ||
# datasourceName: "${ARGOCD_ENV_dashboard_datasource_name}" | ||
# instanceSelector: | ||
# matchLabels: | ||
# dashboards: "grafana" | ||
# url: https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/aws-rds.json | ||
# --- | ||
# apiVersion: grafana.integreatly.org/v1beta1 | ||
# kind: GrafanaDashboard | ||
# metadata: | ||
# name: aws-ebs | ||
# spec: | ||
# folder: aws-managed-services | ||
# datasources: | ||
# - inputName: "DS_PROMETHEUS" | ||
# datasourceName: "${ARGOCD_ENV_dashboard_datasource_name}" | ||
# instanceSelector: | ||
# matchLabels: | ||
# dashboards: "grafana" | ||
# url: https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/aws-ebs.json | ||
# --- | ||
# apiVersion: grafana.integreatly.org/v1beta1 | ||
# kind: GrafanaDashboard | ||
# metadata: | ||
# name: aws-cloudwatch-billing | ||
# spec: | ||
# folder: aws-managed-services | ||
# datasources: | ||
# - inputName: "DS_PROMETHEUS" | ||
# datasourceName: "${ARGOCD_ENV_dashboard_datasource_name}" | ||
# instanceSelector: | ||
# matchLabels: | ||
# dashboards: "grafana" | ||
# url: https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/assets/grafana-dashboards/aws-cloudwatch/cloudwatch-billing.json | ||
# --- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
101 changes: 101 additions & 0 deletions
101
gitops/applications/base/monitoring/cloudwatch-exporter-config.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
# apiVersion: v1alpha1 | ||
# sts-region: us-west-2 # TODO: do not hardcode. Understand what it is | ||
# discovery: | ||
# jobs: | ||
# - type: AWS/EC2 | ||
# regions: [us-west-2] # TODO: do not hardcode, understand what it is | ||
# includeContextOnInfoMetrics: true | ||
# searchTags: | ||
# - key: Name | ||
# value: "Forem to Orbit bridge" | ||
# dimensionNameRequirements: | ||
# - InstanceId | ||
# period: 300 | ||
# length: 300 | ||
# metrics: | ||
# - name: CPUUtilization | ||
# statistics: [Maximum] | ||
# # - name: NetworkIn | ||
# # statistics: [Average, Sum] | ||
# # - name: NetworkOut | ||
# # statistics: [Average, Sum] | ||
# # - name: NetworkPacketsIn | ||
# # statistics: [Sum] | ||
# # - name: NetworkPacketsOut | ||
# # statistics: [Sum] | ||
# # - name: DiskReadBytes | ||
# # statistics: [Sum] | ||
# # - name: DiskWriteBytes | ||
# # statistics: [Sum] | ||
# # - name: DiskReadOps | ||
# # statistics: [Sum] | ||
# # - name: DiskWriteOps | ||
# # statistics: [Sum] | ||
# - name: StatusCheckFailed | ||
# statistics: [Sum] | ||
# - name: StatusCheckFailed_Instance | ||
# statistics: [Sum] | ||
# - name: StatusCheckFailed_System | ||
# statistics: [Sum] | ||
# - type: AWS/EBS | ||
# regions: [us-west-2] # TODO: do not hardcode, understand what it is | ||
# includeContextOnInfoMetrics: true | ||
# searchTags: # update the search tag later | ||
# - key: Name | ||
# value: forem-community.mojaloop.io | ||
# dimensionNameRequirements: | ||
# - VolumeId | ||
# period: 300 | ||
# length: 300 | ||
# metrics: | ||
# - name: VolumeReadBytes | ||
# statistics: [Sum] | ||
# - name: VolumeWriteBytes | ||
# statistics: [Sum] | ||
# - name: VolumeReadOps | ||
# statistics: [Average] | ||
# - name: VolumeWriteOps | ||
# statistics: [Average] | ||
# - name: VolumeIdleTime | ||
# statistics: [Average] | ||
# - name: VolumeTotalReadTime | ||
# statistics: [Average] | ||
# - name: VolumeTotalWriteTime | ||
# statistics: [Average] | ||
# - name: VolumeQueueLength | ||
# statistics: [Average] | ||
# - name: BurstBalance | ||
# statistics: [Average] | ||
# - type: AWS/RDS | ||
# regions: [eu-west-1] # TODO: do not hardcode, understand what it is | ||
# includeContextOnInfoMetrics: true | ||
# searchTags: # update the search tag later | ||
# - key: mojaloop/owner | ||
# value: Samuel-Kummary # TODO: update target tags | ||
# dimensionNameRequirements: | ||
# - DBInstanceIdentifier | ||
# period: 300 | ||
# length: 300 | ||
# metrics: | ||
# - name: CPUUtilization | ||
# statistics: [Maximum] | ||
# - name: CPUUtilization | ||
# statistics: [Maximum] | ||
# - name: DatabaseConnections | ||
# statistics: [Sum] | ||
# - name: FreeStorageSpace | ||
# statistics: [Average] | ||
# - name: FreeableMemory | ||
# statistics: [Average] | ||
# - name: ReadThroughput | ||
# statistics: [Average] | ||
# - name: WriteThroughput | ||
# statistics: [Average] | ||
# - name: ReadIOPS | ||
# statistics: [Average] | ||
# - name: WriteIOPS | ||
# statistics: [Average] | ||
# - name: ReadLatency | ||
# statistics: [Maximum] | ||
# - name: WriteLatency | ||
# statistics: [Maximum] |
Oops, something went wrong.