Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot delete instances - image has watchers - not removing #297

Open
VariableDeclared opened this issue Apr 29, 2024 · 3 comments
Open

Comments

@VariableDeclared
Copy link

VariableDeclared commented Apr 29, 2024

Hello

When trying to delete instances on Microcloud the instances fail to delete with the following error:

Error: Failed deleting instance "private-repo-lds-3" in project "REDACTED_PROJECT_NAME": Error deleting storage volume: Failed to delete volume: Failed to run: rbd --id admin --cluster ceph --pool lxd_remote rm virtual-machine_REDACTED_PROJECT_NAME_private-repo-lds-3.block: exit status 16 (2024-04-29T11:02:41.760+0000 7fe2f4898640 -1 librbd::image::PreRemoveRequest: 0x5563e888b7b0 check_image_watchers: image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.)

The issue was produced by deploying a set of 14 VMs, with the following config: https://pastebin.canonical.com/p/DmfDtKc6cz/

The VMs were deployed on Friday, and left over the weekend. When destroying the VMs then failed with the above error

Workaround

  1. sudo ps aux | grep qemu
  2. Identify the process for your VMs
  3. sudo kill ${PID}

Peter

@tomponline
Copy link
Member

VMs were found to have crashed, killing qemu processes released the rbd volumes to allow deletion.

@VariableDeclared
Copy link
Author

VariableDeclared commented Apr 29, 2024

The steps to reproduce this:

  1. Create VMs as described
  2. Add a new network VLAN to bond on which LXD is running its services via netplan, e.g. vlan with ID 55
  3. Apply VLAN changes
  4. Allow cluster to settle
  5. Attempt removal

These steps are what I can gather has happened since I used the environment. I need to validate this and confirm minimal reproducer

Thank you
Peter

@VariableDeclared
Copy link
Author

The steps to reproduce this:

  1. Create VMs as described
  2. Add a new network VLAN to bond on which LXD is running its services via netplan, e.g. vlan with ID 55
  3. Apply VLAN changes
  4. Allow cluster to settle
  5. Attempt removal

These steps are what I can gather has happened since I used the environment. I need to validate this and confirm minimal reproducer

Thank you Peter

Following up here - I am struggling to validate the above steps as a reproducer. I tried adding a VLAN, and I do see errors from ceph, but still VM deletion is possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants