A Terraform module to create an Amazon Web Services (AWS) EC2 Container Service (ECS) cluster.
Fork of https://github.com/azavea/terraform-aws-ecs-cluster at version 1.0.0
The authors of the original module and the author of this module have agreed to diverge in direction for the module. The author of this module is pursuing the direction of 100% automated operations on the ECS cluster. To that end, this module contains elements to provide zero-downtime rolling updates of the cluster instances.
The approach followed here to achieve zero-downtime rolling updates is an implementation of the approach described in this AWS blog post. It combines building the auto scaling group for the container instances using a cloud formation stack (which provides the rolling update mechanism), with a lambda function that is triggered on instance termination events to make sure the container instance is drained of cluster service tasks before it gets terminated.
data "template_file" "container_instance_cloud_config" {
template = "${file("cloud-config/container-instance.yml.tpl")}"
vars {
environment = "${var.environment}"
}
}
module "ecs_cluster" {
source = "git://gitlab.com/open-source-devex/terraform-modules/aws/ecs-cluster.git?ref=1.0.0"
vpc_id = "vpc-20f74844"
ami_id = "ami-b2df2ca4"
instance_type = "t2.micro"
cloud_config_content = "${data.template_file.container_instance_cloud_config.rendered}"
root_block_device_type = "gp2"
root_block_device_size = "10"
health_check_grace_period = "600"
desired_capacity = "1"
min_size = "0"
max_size = "1"
enabled_metrics = [
"GroupMinSize",
"GroupMaxSize",
"GroupDesiredCapacity",
"GroupInServiceInstances",
"GroupPendingInstances",
"GroupStandbyInstances",
"GroupTerminatingInstances",
"GroupTotalInstances",
]
vpc_private_subnet_ids = [...]
project = "Something"
environment = "Staging"
}
vpc_id
- ID of VPC meant to house clusterlookup_latest_ami
- lookup the latest Amazon-owned ECS AMI. If this variable istrue
, the latest ECS AMI will be used, even ifami_id
is provided (default:false
).ami_id
- Cluster instance Amazon Machine Image (AMI) ID. Iflookup_latest_ami
istrue
, this variable will be silently ignored.ami_owners
- List of accounts that own the AMI (default:self, amazon, aws-marketplace
)root_block_device_type
- Instance root block device type (default:gp2
)root_block_device_size
- Instance root block device size in gigabytes (default:8
)instance_type
- Instance type for cluster instances (default:t2.micro
)cloud_config_content
- user data supplied to launch configuration for cluster nodescloud_config_content_type
- the type of configuration being passed in as user data, see EC2 user guide for a list of possible types (default:text/cloud-config
)health_check_grace_period
- Time in seconds after container instance comes into service before checking health (default:600
)desired_capacity
- Number of EC2 instances that should be running in cluster (default:1
)min_size
- Minimum number of EC2 instances in cluster (default:0
)max_size
- Maximum number of EC2 instances in cluster (default:1
)termination_policies
- Policies to use when deciding which instance to terminate (default:["OldestLaunchConfiguration", "Default"]
)rolling_update_max_batch_size
- Max number of instances to update at the same time (default:1
)rolling_update_pause_time
- How much time to wait between updating instances (default:PT5M
- 5 minutes)rolling_update_wait_on_signal
- Should rolling update wait for a signal sent from each new instance before moving on to the nextenabled_metrics
- A list of metrics to gather for the clustervpc_private_subnet_ids
- A list of private subnet IDs to launch cluster instancesscale_out_cooldown_seconds
- Number of seconds before allowing another scale out activity (default:300
)scale_in_cooldown_seconds
- Number of seconds before allowing another scale in activity (default:300
)high_cpu_evaluation_periods
- Number of evaluation periods for high CPU alarm (default:2
)high_cpu_period_seconds
- Number of seconds in an evaluation period for high CPU alarm (default:300
)high_cpu_threshold_percent
- Threshold as a percentage for high CPU alarm (default:90
)low_cpu_evaluation_periods
- Number of evaluation periods for low CPU alarm (default:2
)low_cpu_period_seconds
- Number of seconds in an evaluation period for low CPU alarm (default:300
)low_cpu_threshold_percent
- Threshold as a percentage for low CPU alarm (default:10
)high_memory_evaluation_periods
- Number of evaluation periods for high memory alarm (default:2
)high_memory_period_seconds
- Number of seconds in an evaluation period for high memory alarm (default:300
)high_memory_threshold_percent
- Threshold as a percentage for high memory alarm (default:90
)low_memory_evaluation_periods
- Number of evaluation periods for low memory alarm (default:2
)low_memory_period_seconds
- Number of seconds in an evaluation period for low memory alarm (default:300
)low_memory_threshold_percent
- Threshold as a percentage for low memory alarm (default:10
)project
- Name of project this cluster is for (default: ``)environment
- Name of environment this cluster is targeting (default:env
)ecs_instance_daemon_tasks
- Number of tasks running as daemons on the cluster instances (default:0
)monitoring_enabled
- Sets the value of 'MonitoringEnabled' resource tag (default:false
)allow_ssh_in
- Set to true to configure SSH access to cluster instances (requiresssh_public_key_file
andssh_allowed_cidr
). (default:false
)ssh_public_key_file
- an SSH key pair public file. (default: ``)ssh_allowed_cidrs
- A list of CIDRs from which instances accept SSH connections. (default: ``)allow_http_out
- Set to true to configure HTTP access from cluster instances (requireshttp_allowed_cidr
). (defaultfalse
)http_allowed_cidrs
- A list of CIDRs to which instances can connect via HTTP.allow_https_out
- Set to true to configure HTTPS access from cluster instances (requireshttps_allowed_cidr
). (defaultfalse
)https_allowed_cidrs
- A list of CIDRs to which instances can connect via HTTPS.allow_consul_gossip
- Set to true to configure Consul gossip ports to and from cluster instances (requiresconsul_gossip_allowed_cidr
).consul_gossip_allowed_cidrs
- A list of CIDRs to allow Consul gossip traffic to and from.allow_consul_client_server_rdp
- Set to true to configure Consul RDP access from cluster instances (requiresconsul_client_server_rdp_allowed_cidr
).consul_client_server_rdp_allowed_cidrs
- A list of CIDRs to which instances can connect via Consul RDPallow_logzio_eu_out
- Set to true to configure (logback appender to) logz.io (EU) access from cluster instances (requireslogzio_eu_allowed_cidr
).enable_daemon_tasks
- Set to true to create the required IAM policies to allow cluster instances to start their own daemon tasks. (default:false
)
id
- The container service cluster IDname
- The container service cluster namecontainer_instance_security_group_id
- Security group ID of the EC2 container instancescontainer_instance_ecs_for_ec2_service_role_name
- Name of IAM role associated with EC2 container instancesecs_service_role_name
- Name of IAM role for use with ECS servicesecs_autoscale_role_name
- Name of IAM role for use with ECS service autoscalingecs_service_role_arn
- ARN of IAM role for use with ECS servicesecs_autoscale_role_arn
- ARN of IAM role for use with ECS service autoscalingcontainer_instance_ecs_for_ec2_service_role_arn
- ARN of IAM role associated with EC2 container instances
Contributions to this module are very welcome, please fork this repository and submit merge requests. We are trying an approach to testing based on test-kitchen and the kitchen-terraform plugin, so please consider adding tests to your merge requests.
The integration tests require ruby and companion tools like bundler to be available on the command line.
cd integration-testing
touch secrets.tfvars
# edit secrets.tfvars to define terraform variables: 'aws_access_key_id', 'aws_secret_access_key' and 'aws_region'
bundle install # to install test tooling
bundle exec kitchen converge # to provision the test infrastructure on AWS
bundle exec kitchen verify # to run the tests
bundle exec kitchen destroy # to destroy the test infrastructure
When creating the infrastructure to test agains (kitchen converge
), kitchen requires the public IP of the instance being created by the autoscaling group.
Since terraform is not managing the instances directly, a data source is used in the test fixture to fetch the IP.
data "aws_instances" "test" {
instance_tags {
Environment = "${var.name}"
}
depends_on = ["module.ecs_cluster"]
}
The explicit dependency on module.ecs_cluster
is added to enforce that this data source should only be computed after the cluster is built (which doesn't mean that the instance will be ready, so it sometimes fails and needs a retry).
However, when modifying the launch configuration of the autoscaling group, terraform will report an dependency cycle.
In that case one needs to comment out the explicit dependency to allow terraform to make the modification.
- Copyright December 2017 - to date, Open Source DevEx and the 'open-source-devex/terraform-modules/aws/ecs-cluster' contributors
- Copyright March 2017 - December 2017, Azavea, Inc and the 'azavea/terraform-aws-ecs-cluster' contributors