Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing a Microcloud cluster member does not remove the underlying LXD cluster member #160

Open
gabrielmougard opened this issue Sep 18, 2023 · 6 comments
Labels
Feature New feature, not a bug

Comments

@gabrielmougard
Copy link
Contributor

gabrielmougard commented Sep 18, 2023

Having a simple 3 nodes cluster configuration like so:

root@v3:~# microcloud cluster list
+------+-------------------+-------+------------------------------------------------------------------+--------+
| NAME |      ADDRESS      |  ROLE |                           FINGERPRINT                            | STATUS |
+------+-------------------+-------+------------------------------------------------------------------+--------+
| v1   | 10.10.10.67:9443  | voter | 3d4140ec40d677b2a9a4870511b144f795578f0007d32cdef962a177cf152286 | ONLINE |
+------+-------------------+-------+------------------------------------------------------------------+--------+
| v2   | 10.10.10.217:9443 | voter | 621fe0a5e252b80764fc0528e269046ff583d4e52ac17f980fdbf71a177890e6 | ONLINE |
+------+-------------------+------+------------------------------------------------------------------+--------+
| v3   | 10.10.10.86:9443  | voter | 0967c4417e555d1bf79f345ffaa6c6c1eb1b0e8ddd73b682980860f689f998e4 | ONLINE |
+------+-------------------+-------+------------------------------------------------------------------+--------+

When I want to remove a microcloud node with microcloud cluster remove v3 for example, this works as expected (for example, I go on v2 a list the microcloud members)

root@v2:~# microcloud cluster list
+------+-------------------+-------+------------------------------------------------------------------+--------+
| NAME |      ADDRESS      | ROLE  |                           FINGERPRINT                            | STATUS |
+------+-------------------+-------+------------------------------------------------------------------+--------+
| v1   | 10.10.10.67:9443  | voter | 3d4140ec40d677b2a9a4870511b144f795578f0007d32cdef962a177cf152286 | ONLINE |
+------+-------------------+-------+------------------------------------------------------------------+--------+
| v2   | 10.10.10.217:9443 | spare | 621fe0a5e252b80764fc0528e269046ff583d4e52ac17f980fdbf71a177890e6 | ONLINE |
+------+-------------------+-------+------------------------------------------------------------------+--------+

But on every node, if I do a lxc cluster list, I see all the members:

root@v3:~# lxc cluster list
+------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| NAME |            URL            |      ROLES      | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE  |      MESSAGE      |
+------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| v1   | https://10.10.10.67:8443  | database        | x86_64       | default        |             | ONLINE | Fully operational |
+------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| v2   | https://10.10.10.217:8443 | database        | x86_64       | default        |             | ONLINE | Fully operational |
+------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| v3   | https://10.10.10.86:8443  | database-leader | x86_64       | default        |             | ONLINE | Fully operational |
|      |                           | database        |              |                |             |        |                   |
+------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+

This behaviour is not very 'symmetric' with microcloud init that creates underlying LXD cluster members. I would expect microcloud cluster remove <node_name> to remove the underlying LXD cluster member (the one listed with lxc cluster list) as well.

I'm also curious to know how it behaves with microceph/microovn: does a microcloud cluster remove <node_name> triggers an automatic microceph cluster remove <node_name> / microovn cluster remove <node_name> as well ? I don't know what is the expected behaviour here, but I'd say that if we remove a microcloud node, we also would like to remove its associated node in the microceph / microovm cluster as they are meant to work all together..

@tomponline
Copy link
Member

@masnax @markylaing do you know what the expected behaviour here is? Thanks

@markylaing
Copy link
Contributor

It looks like the CLI only removes the microcluster member and does not make any calls to LXD, Ceph, or OVN

func (c *cmdClusterMemberRemove) Run(cmd *cobra.Command, args []string) error {
if len(args) != 1 {
return cmd.Help()
}
options := microcluster.Args{StateDir: c.common.FlagMicroCloudDir, Verbose: c.common.FlagLogVerbose, Debug: c.common.FlagLogDebug}
m, err := microcluster.App(context.Background(), options)
if err != nil {
return err
}
client, err := m.LocalClient()
if err != nil {
return err
}
err = client.DeleteClusterMember(context.Background(), args[0], c.flagForce)
if err != nil {
return err
}
return nil
}

I agree with @gabrielmougard this should remove the node from all of them. We will need to figure out what to do with running instances, especially those on local storage.

@gabrielmougard
Copy link
Contributor Author

gabrielmougard commented Sep 18, 2023

@markylaing there is this #33, which previously mentionned the problem we're trying to solve.

@masnax
Copy link
Contributor

masnax commented Sep 18, 2023

I think it would be fair to error out if trying to remove a node with local instances. The user should sort out what they want to do with those instances first before removing the node. Maybe a force flag can nuke the node and its instances if it's unresponsive. Ceph instances can be moved, though that poses whether that should be according LXDs cluster scheduling or user-defined.

I think it would make sense for the time being to look into adding a Remove function for each service that calls the respective cluster remove API hook.

Supposedly MicroOVN fully supports this already, so that one is straightforward.

LXD can check for local instances and fail if --force is not given

MicroCeph won't work for now though, so we will need to error if that's installed.

We could have an IsRemovable function that performs these validations on all services before progressing to the Remove step.

@tomponline
Copy link
Member

Sounds good!

@markylaing
Copy link
Contributor

I think it would make sense for the time being to look into adding a Remove function for each service that calls the respective cluster remove API hook.

Supposedly MicroOVN fully supports this already, so that one is straightforward.

MicroOVN uses a microcluster hook to define how a member is removed:

Since microceph also uses microcluster it can do the same. We'll just need to implement the logic for LXD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature, not a bug
Projects
None yet
Development

No branches or pull requests

5 participants