-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EC2 destroy.yml will terminate other instances if no match is found #121
Comments
hm. sounds bad. Can you please submit a PR ? |
Still working on it... |
I've been able to reproduce the behavior, although I don't recall exactly how I got to it. There is no instance_config.yml {} (I know - this doesn't look very much like yml!) state.yml # Molecule managed
---
converged: false
created: false
driver: null
is_parallel: false
prepared: null
run_uuid: eceff0f9-52b6-40ad-8b96-778f43a3a0a8 The really scary play is - name: Destroy ephemeral EC2 instances
ec2_instance:
profile: "{{ item.aws_profile | default(omit) }}"
region: "{{ item.region | default(omit) }}"
instance_ids: "{{ instance_config | map(attribute='instance_ids') | flatten }}"
state: absent
loop: "{{ platforms }}"
loop_control:
label: "{{ item.name }}"
register: ec2_instances_async
async: 7200
poll: 0 because adding this play immediately before it - name: list of instance ids...
debug:
msg: "{{ instance_config | map(attribute='instance_ids') | flatten }}"
loop: "{{ platforms }}"
loop_control:
label: "{{ item.name }}" results in
If I didn't immediately 'fail' I would have seen msg: 'Unable to terminate instances: An error occurred (OperationNotPermitted) when calling the TerminateInstances operation: The instance ''i-0e19535f5a288e6fc'' may not be terminated. Modify its ''disableApiTermination'' instance attribute and try again.'
results_file: /home/bgiles/.ansible_async/255023698564.4155266
started: 1
stderr: ''
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
terminate_failed:
- i-011529bd170f8d765
- i-0aaa746642d96f508
- i-063bd9a60e1168700
- i-0def08ff5a17bf1d0
- i-016c8680ff0fb57cf
- i-0f8848b6cb98e6a5c
- i-06b46dddb84820d9a
- i-0aa3b858d938d79ab
- i-02b479461c88a51f6
- i-008723f22d5dd9bef
- i-0e19535f5a288e6fc
- i-0974455a156207f96
- i-001732bd76214758a
- i-01cfb1b4b84002204
- i-08ca360a86f590fdb
- i-03df2c30f27a97bae
- i-06ecb13be3e37f6ac
- i-03bb783117c63a926
- i-04b6bde567b32e0bd
- i-044e8d0655937e5db
- i-0266cb409a1947325
- i-0614ac8f419da9021
- i-06c69c1b5c021d610
- i-07474f12b6ad658ba
- i-032b3e77f72f97d1c
- i-0f32470a91df12f8b
- i-0868fe1cddc6a94c2
- i-03f7c088088d2b68b
- i-0812c02988d05499c
- i-0865f1cf78a1c4db1
terminate_success: [] There's an immediate improvement by adding just one line: - name: Destroy ephemeral EC2 instances
ec2_instance:
profile: "{{ item.aws_profile | default(omit) }}"
region: "{{ item.region | default(omit) }}"
instance_ids: "{{ instance_config | map(attribute='instance_ids') | flatten }}"
# --- start of new code ---
subnet_id: "{{ item.vpc_subnet_id }}"
# --- end of new code ---
state: absent
loop: "{{ platforms }}"
loop_control:
label: "{{ item.name }}"
register: ec2_instances_async
async: 7200
poll: 0 This only works if the molecule tests are run in a dedicated subnet... but that would be a Good Thing anyway. It could still cause problems if there's permanent infrastructure in the same subnet, e.g., a centralized log server. It looks like this might work - but it will require changes to at least one other play. - name: Destroy ephemeral EC2 instances
ec2_instance:
profile: "{{ item.aws_profile | default(omit) }}"
region: "{{ item.region | default(omit) }}"
instance_ids: "{{ instance_ids }}"
state: absent
loop: "{{ platforms }}"
loop_control:
label: "{{ item.name }}"
# --- start of new code ---
vars:
instance_ids: "{{ instance_config | map(attribute='instance_ids') | flatten }}"
when:
- instance_ids is defined
- instance_ids | length > 0
# --- end of new code ---
register: ec2_instances_async
async: 7200
poll: 0 Finally, something else I've been looking at is modifying the plays that create ephemeral keys, security groups, and instances so they include one or two additional tags:
And the 'run_config.yml' file also captures the run_id. There's three reasons for this:
(I have a ton of orphaned keys and security groups...) The search can still include criteria like the expected subnet, etc. I've also tried adding any ephemeral keys and groups to the run_config.yml file. The biggest downside is that you now need to delete the ephemeral keys if there was a failure in the create playbook. Otherwise the next run will reuse the existing key - which means it won't be automatically deleted later if you rely on just the run-id value. |
I forgot to mention - I added a bit more code when adding the tags to the EC2 instance. Basically I split 'platform_generated_tags' into two. The filtered tags include what's already present - The unfiltered tags include the I also added the vpc-id, image-id, and key-name values to the filter since that's the default values. The first two should be safe, the third one might not be with ephemeral keys. It won't change the results but it might be a little more efficient on the AWS side. .... Also - I used 'molecule-run-id' instead of just 'run-id' since it's possible that something else might be creating keys and security groups with just 'run-id'. Adding 'molecule-' should make it unique. |
Please resolve this?? |
Sorry I dropped this - I got pulled into other tasks... then laid off. The change set for the unfiltered tag was large enough that I knew it would take some time to prepare it for review. In addition... the more I looked at the security issues and our needs (which potentially include S3, SQS, EKS, etc.) I decided that a better approach for us would be a lambda function. The molecule driver would just perform input validation and then call the lambda function for the actual work. This solves the security issues since the lambda function would have them (via IAM role), not the process calling it. It makes it easier to create a default security group restricted to the host running the molecule tests. Molecule tests running on EC2 instances usually only have their internal IP address but if you use a dedicated VPC for your tests but haven't set up a gateway you would need to use the public IP address. It solves the issue of accidental deletions since the lambda function could maintain a small database of the instances it creates and then check that list before deleting anything. It makes it easier to add support for additional resources since the lambda function is written in java (or python, etc.) instead of ansible. Finally it lets you update the implementation without changing any existing tests. (Once they use the lambda function.) The lambda-based approach should probably be an 'aws' driver, not 'ec2' driver, in order to avoid confusion and to reflect the intention to support much more than just ec2. |
Just a note, this looks like it has been fixed via: #170 A fix for anyone still encountering it would be to update the plugin and delete -> re-generate the destroy scenarios under |
The EC2 driver maintains information about the instances it creates then uses that information to destroy the instance.
This isn't a problem (usually) when running molecule tests to completion. However when using test-driven development (TDD) you may routinely run the test to 'converge', do a bit more work, and then complete the molecule tests to see if the results have improved. There's a significant downside to this - the longer this instance is running the greater the risk that another process will terminate the instance. It might be a local process, it might be another process interacting with EC2.
When this happens (ouch) the code blocks
will quietly set
instance_ids
to null instead of failing. That means the play will match all EC2 instances matching that profile and region.I suspect the playbook had originally verified that the instance(s) was still running as part of the creation of the
platforms
list and that list would have been empty if the instance(s) was no longer running but that's no longer the case.Fortunately there's a simple fix:
with related updates below. (This isn't the final code - just a demonstration that it's easy to check whether the list will be empty.)
PR to follow.
The text was updated successfully, but these errors were encountered: