ecs-ami-deploy
is a library of code for intelligently replacing instances in an autoscaling group (ASG) with
the latest ECS Optimized image.
Based on a couple assumptions, this library can replace all instances in an ECS cluster without causing downtime for
any of the running services.
This process is available as:
- A Go Module:
import "github.com/silinternational/ecs-ami-deploy/v2"
- A command line application. See
cli/
directory
AWS releases new ECS optimized AMIs on a pretty frequent basis. It's generally a good idea to keep up to date with releases for security, performance, and other enhancements. However, replacing instances can be a complicated dance if you also want to maintain a high level of availability for the services you're running on the instances to be replaced. AWS has a feature to "refresh" instances in an autoscaling group, but unfortunately it isn't "ECS aware". Rather, it removes EC2 instances from the autoscaling group, terminates them, and replaces them. When the new instances are "healthy" it moves on to replace more instances. It does not take into account if the services you have running in ECS are stable yet when removing the next instance. As a result, refreshing instances in an ASG using the built-in feature can and will likely cause downtime for your ECS services.
That isn't really an acceptable solution for us, so we created ecs-ami-deploy
to refresh instances with
an awareness of the services running in ECS and to replace EC2 instances only when ECS services are stable. This
process does assume more than one task per service is running so that when an EC2 instance is removed, ECS
will launch a new task on a different instance while other tasks for this service remain where they are. Then
one by one, EC2 instances are removed from service and terminated, and only when all services are stable again, that is
they have zero pending tasks, then the next EC2 instance can be removed from service and so forth.
Gracefully replacing instances can take some time, especially for clusters with many instances supporting them. The process was designed to be fault-tolerant and to pick up where it left off should it terminate. One of the ways this is accomplished is that all EC2 instances in the ASG are tagged before being detached and terminated. On successive runs the process looks for tagged instances for the given cluster that are no longer in service and continues the graceful termination process while monitoring the ECS cluster services for stability.
If --force-replacement
is enabled, the process will always replace all instances whether there is a newer AMI
available or not. When --force-replacement
is enabled the process is not idempotent.
- Look up latest AMI based on either the given AMI filter, or the default:
amzn2-ami-ecs-hvm-*-x86_64-ebs
- Identify the ASG for the given ECS cluster to get current launch template and instances list
- Compare latest AMI with AMI in use by launch template
- If cluster is not using latest AMI, or
force replacement
is enabled, proceed to #4 - Else if using latest AMI already, jump to #10
- If cluster is not using latest AMI, or
- Create new launch template version with new AMI
- Update launch template default version and set ASG to use the latest template version (
"$Latest"
) - Detach existing instances from ASG and replace with new ones
- Wait for new instances to reach
InService
state with ASG - Watch ECS cluster instances until all new ones are registered and available
- For each old instance that needs to be removed:
- Deregister one instance from ECS cluster
- Wait for zero pending tasks in cluster
- Terminate old ASG EC2 instance
- Scan all EC2 instances for any instances tagged for termination as part of this operation in case any
were missed on a previous run due to timeout or something else. For each:
- Terminate instance
- Wait for zero pending tasks in cluster
- Consistentify logging vs. returning errors
- Add a logger that can send output to an email as well as stdout
- Create Lambda wrapper and provide trigger examples for schedule and SNS when newer AMI is released
- Grab the latest binary for your platform at https://github.com/silinternational/ecs-ami-deploy/releases
- The CLI makes use of AWS's SDK for Go, which can load authentication credentials from various places similar to the AWS CLI itself
- Run
ecs-ami-deploy list-clusters
to check if it's working and what clusters you have available. - If you have multiple profiles configured in your
~/.aws/credentials
file, you can use the-p
or--profile
flags to specify a different profile. - The CLI defaults to region
us-east-1
, you can use the-r
or--region
flags to specify a different region - The CLI has help information built in for the various subcommands and their supported flags, use
-h
or--help
flags with each subcommand for more information.