Skip to content

rererecursive/cloudwatch-monitoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cloudwatch-monitoring

Configuration

Create a symlink named "ciinaboxes" with a target of your base2-ciinabox repo (similar to ciinabox-jenkins)

Example (with cloudwatch-monitoring and base2-ciinabox in the same directory):

cd cloudwatch-monitoring
ln -s ../base2-ciinabox ciinaboxes

Usage

rake cfn:generate <customer> [application]
Parameter Value
customer The customer's ciinabox name (directory in base2-ciinabox repo)
application (Optional) For use when a customer has multiple applications

Alarm configuration

All configuration takes place in the base2-ciinabox repo under the customer's ciinabox directory. Create a directory name "monitoring" (similar to the "jenkins" directory for ciinabox-jenkins), this directory will contain the "alarms.yml" file and optional "templates.yml" file.

alarms.yml

This file is used to configure the AWS resources you want to monitor with CloudWatch.

source_bucket: [Name of S3 bucket where CloudFormation templates will be deployed]
source_region: [Region of source_bucket]

resources:
  [nested stack name].[resource name]: [template name]

Example:

source_bucket: source.customer.com

resources:
  RDSStack.RDS: RDSInstance

Resources

Resources are referenced by the CloudFormation logical resource ID used to create them. Nested stacks are also referenced by their CloudFormation logical resource ID. See example above.

Target group configuration:

Target group alarms in CloudWatch require dimensions for both the target group and its associated load balancer. To configure a target group alarm provide the logical ID of the target group (including any stacks it's nested under) followed by "/", followed by the logical ID of the load balancer (also including any stacks it's nested under).

Example:

resources:
  LoadBalancerStack.WebDefTargetGroup/LoadBalancerStack.WebLoadBalancer: ApplicationELBTargetGroup

Custom Metrics

Custom metrics are configured with a similar syntax to resources. Use metrics instead of resources.

Example:

metrics:
  MyCustomMetric: MyCustomMetricTemplate

Endpoints

HTTP endpoint monitoring and alerting is enabled by configuring resources under endpoints. Each endpoint will create a cloudwatch event, scheduled to trigger the aws-lambda-http-check lambda function deployed with this stack. Alarms will be configured (based on the specified template) to alert on the cloudwatch metrics generated by the lambda function.

Example:

endpoints:
  http://www.base2services.com:
    template: HttpCheck
    statusCode: 200
    bodyRegex: 'DevOps'
endpoints:
  http://www.base2services.com:
    template: HttpCheck
    statusCode: 200
    bodyRegex: 'DevOps'
    payload: id_=123
    method: POST

Supported parameters:

Key Value Default
statusCode The expected response code 200
bodyRegex A regex expected in the response body Disabled
timeOut A timeout value for the endpoint monitoring 120 seconds
scheduleExpression A cron expression used to schedule the endpoint monitoring Every minute
environments A string or array of environment names. Monitoring will only be deployed for these environments (if specified) All environments

Multiple templates

You can specify multiple templates for the resource by providing a list/array. You may want to do this if you want to deploy some custom alarms in addition to the default alarms for a resource.

Example:

resources:
  RDSStack.RDS: [ 'RDSInstance', 'MyRDSInstance' ]

or

resources:
  RDSStack.RDS:
    - RDSInstance
    - MyRDSInstance

Auto generate alarms config for resources

You can query an existing stack for monitorable resources using the query rake task. This will provide a list of resources in the correct config syntax, including the nested stacks and the default templates for those resources.

Example:

eval $(elmer get-creds [customer] prod --format shell)
rake cfn:query <region> <stack> <customer> [application]
Parameter Value
region The region of the stack you are querying (eg. ap-southeast-2)
stack The name of the stack you are querying (eg. prod)
customer The customer's ciinabox name (directory in base2-ciinabox repo)
application (Optional) For use when a customer has multiple applications

Make sure you query a prod sized stack so that all conditional resources are included. The output will list all monitorable resources found in the stack, the coverage your current alarms.yml config provides, and a list of any resources missing from your current alarms.yml config.

Templates

The "template" value you specify for a resource refers to either a default template in the config/templates.yml file of this repo, or a custom/override template in the monitoring/templates.yml file of the customer's ciinabox monitoring directory. This template can contain multiple alarms. The example below shows the default RDSInstance template, which has 2 alarms (FreeStorageSpaceCrit and FreeStorageSpaceTask). Using the RDSInstance template in this example will create 2 CloudWatch alarms for the RDS resource in the RDSStack nested stack.

Example: alarms.yml

resources:
  RDSStack.RDS: RDSInstance

Example: templates.yml

templates:
  RDSInstance: # AWS::RDS::DBInstance
    FreeStorageSpaceCrit:
      AlarmActions: crit
      Namespace: AWS/RDS
      MetricName: FreeStorageSpace
      ComparisonOperator: LessThanThreshold
      DimensionsName: DBInstanceIdentifier
      Statistic: Minimum
      Threshold: 50000000000
      Threshold.development: 10000000000
      EvaluationPeriods: 1
    FreeStorageSpaceTask:
      AlarmActions: task
      Namespace: AWS/RDS
      MetricName: FreeStorageSpace
      ComparisonOperator: LessThanThreshold
      DimensionsName: DBInstanceIdentifier
      Statistic: Minimum
      Threshold: 100000000000
      Threshold.development: 20000000000
      EvaluationPeriods: 1

templates.yml

You should start by using the default templates in cloudwatch-monitoring/config/templates.yml and override, replace or augment them with custom templates in base2-ciinabox/[customer]/monitoring/templates as required.

Globally overriding a template

You can override a default template in the customer's templates.yml file if all instances of a particular resource require a non standard configuration for that customer.

Example:

templates:
  RDSInstance:
    FreeStorageSpaceCrit:
      Threshold: 80000000000

This configuration will be merged over the default RDSInstance template resulting in the following:

templates:
  RDSInstance:
    FreeStorageSpaceCrit:
      AlarmActions: crit
      Namespace: AWS/RDS
      MetricName: FreeStorageSpace
      ComparisonOperator: LessThanThreshold
      DimensionsName: DBInstanceIdentifier
      Statistic: Minimum
      Threshold: 80000000000
      Threshold.development: 10000000000
      EvaluationPeriods: 1
    FreeStorageSpaceTask:
      AlarmActions: task
      Namespace: AWS/RDS
      MetricName: FreeStorageSpace
      ComparisonOperator: LessThanThreshold
      DimensionsName: DBInstanceIdentifier
      Statistic: Minimum
      Threshold: 100000000000
      Threshold.development: 20000000000
      EvaluationPeriods: 1

Create a custom template

If the default template for your resource is completely inappropriate, you can create your own custom template in the monitoring/templates.yml file.

Example:

templates:
  MyRDSInstance:
    DatabaseConnections:
      AlarmActions: crit
      Namespace: AWS/RDS
      MetricName: DatabaseConnections
      ComparisonOperator: MoreThanThreshold
      DimensionsName: DBInstanceIdentifier
      Statistic: Average
      Threshold: 20
      EvaluationPeriods: 5

Inherit a template

If you have multiple instances of a particular resource and you want to adjust the configuration for only some of them, you can create your own custom template which inherits the configuration of a default template.

Example:

templates:
  MyRDSInstance:
    template: RDSInstance
    FreeStorageSpaceCrit:
      Threshold: 80000000000

The above example creates a new template MyRDSInstance which can now be used by one or many resources. The MyRDSInstance template inherits all of the alarms and configuration from RDSInstance, but sets Threshold to 80000000000 for the FreeStorageSpaceCrit alarm.

Environment type mappings

You can create environment type mappings if alarm configurations need to differ between different environment types. This may be useful in situations where development type environments are running different resource quantities or sizes.

Example:

templates:
  RDSInstance:
    FreeStorageSpaceCrit:
      Threshold: 40000000000
      Threshold.development: 20000000000
      Threshold.staging: 30000000000
      EvaluationPeriods: 5

The above example shows different Threshold values for EnvironmentType values of production (default), development or staging. Any value can be specified using the .envType syntax and the necessary mappings and EnvironmentType will be generated when rendered. The EvaluationPeriods value for development and staging type environments will be 5 in the above example as no .envType values where provided for this parameter.

Supported Parameters:

Parameter Mapping support
ActionsEnabled true
AlarmActions false
AlarmDescription false
ComparisonOperator true
Dimensions false
EvaluateLowSampleCountPercentile false
EvaluationPeriods true
ExtendedStatistic false
InsufficientDataActions false
MetricName true
Namespace true
OKActions false
Period true
Statistic true
Threshold true
TreatMissingData true
Unit false

Template variables

The following variables can be used in templates:

Variable Key Variable Value
${name} Metric/Resource Name (from alarms.yml)
${metric} Metric Name (from alarms.yml)
${resource} Resource Name (from alarms.yml)
${templateName} Template Name (from templates.yml)
${alarmName} Alarm Name (from templates.yml)

Example:

alarms.yml

metrics:
  Metric1: MyCustomMetric

templates.yml

templates:
  MyCustomMetric:
    ItemCountHigh:
      MetricName: ${metric}
      AlarmDescription: '#{templateName} #{alarmName} - #{name}'

Result:

templates:
  MyCustomMetric:
    ItemCountHigh:
      MetricName: Metric1
      AlarmDescription: 'MyCustomMetric ItemCountHigh - Metric1'

Alarm Actions

There are 3 classes of alarm actions: crit, warn and task.

Action Process
crit Alert on-call technician
warn Create alarm in pager service but do not alert on-call technician
task Create support ticket for investigation

An SNS topic is required per alarm action, these topics and their subscriptions are managed outside this stack

Deployment

The rendered CloudFormation templates should be deployed to [source_bucket]/cloudformation/monitoring/.

eval $(elmer get-creds [customer] ops --format shell)
rake cfn:deploy <customer> [application]
Parameter Value
customer The customer's ciinabox name (directory in base2-ciinabox repo)
application (Optional) For use when a customer has multiple applications

Launch the Monitoring stack in the desired account with the following CloudFormation parameters:

Parameter Key Parameter Value
EnvironmentType production / development / custom env type
MonitoredStack The name of the stack you want monitored. EG prod
MonitoringDisabled true for disables alerts, false for enabled alerts
SnsTopicCrit SNS topic used by crit type alarms
SnsTopicTask SNS topic used by task type alarms
SnsTopicWarn SNS topic used by warn type alarms

Disabling Monitoring

It is possible to globally disable / snooze / downtime all alarms by setting the MonitoringDisabled CloudFormation parameter to true. This will disable alarm actions without removing removing them.

Disabling and excluding alarms

To disable or prevent creation of a specific alarm, specify either of the following parameters:

templates:
  MyAutoScalingGroup:
    template: AutoScalingGroup
    CPUUtilizationHighBase:
      CreateAlarm: false    # Don't create the alarm
      DisableAlarm: true    # Create the alarm but disable it