-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: scale Juju controllers according to anvil cluster size #30
base: main
Are you sure you want to change the base?
Conversation
Should we add information in the README the explains that juju controllers will grow with the cluster, rather than leaving it transparent to the user? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just finished another round of review. Apart from the comments, I think that we should also add a scale down process. That can be handled on the same step with extra logic or the already implemented step should be renamed to something like ScaleUpJujuStep
and a second one should be introduced: ScaleDownJujuStep
. The scale down will just be remove-unit controller/<unit>
and has to run when nodes are removed from the anvil cluster.
|
…dd custom JujuManifest to remove scaling_args. Add is_skip override to determine whether it needs to be run and determine which machines need controllers
…controllers are removed
This is blocked by: https://bugs.launchpad.net/juju/+bug/2073986 |
MAX_JUJU_CONTROLLERS - len(self.controller_machines), | ||
) | ||
|
||
cmd = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command should be wrapped with try/expect since it can fail and we need to know why. This is a failure than can happen because of the linked bug:
ubuntu@maas-1:~$ juju enable-ha -n 3 --to 3
ERROR juju-ha-space is not set and a unique usable address was not found for machines: 0
With the current code, we fail in an unexpected way:
EBUG Command '['/snap/maas-anvil/x1/juju/bin/juju', 'enable-ha', '-n', '3', '--to', '3']' returned non-zero exit status 1. utils.py:38
Traceback (most recent call last):
File "/snap/maas-anvil/x1/lib/python3.10/site-packages/anvil/utils.py", line 32, in __call__
return self.main(*args, **kwargs)
File "/snap/maas-anvil/x1/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/snap/maas-anvil/x1/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/snap/maas-anvil/x1/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/snap/maas-anvil/x1/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/snap/maas-anvil/x1/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/snap/maas-anvil/x1/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/snap/maas-anvil/x1/lib/python3.10/site-packages/anvil/provider/local/commands.py", line 599, in remove
run_plan(plan, console)
File "/snap/maas-anvil/x1/lib/python3.10/site-packages/sunbeam/jobs/common.py", line 277, in run_plan
result = step.run(status)
File "/snap/maas-anvil/x1/lib/python3.10/site-packages/anvil/commands/juju.py", line 186, in run
process = subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/snap/maas-anvil/x1/juju/bin/juju', 'enable-ha', '-n', '3', '--to', '3']' returned non-zero exit status 1.
WARNING An unexpected error has occurred. Please run 'maas-anvil inspect' to generate an inspection report. utils.py:43
ERROR Error: Command '['/snap/maas-anvil/x1/juju/bin/juju', 'enable-ha', '-n', '3', '--to', '3']' returned non-zero exit status 1. utils.py:44
ERROR Task was destroyed but it is pending! base_events.py:1758
task: <Task pending name='Task-59' coro=<Connection._pinger.<locals>._do_ping() done, defined at
/snap/maas-anvil/x1/lib/python3.10/site-packages/juju/client/connection.py:599> wait_for=<Future cancelled>
cb=[create_task_with_handler.<locals>._task_result_exp_handler(task_name='tmp', logger=<Logger juju....ction (ERROR)>)() at
/snap/maas-anvil/x1/lib/python3.10/site-packages/juju/jasyncio.py:39]>
Add controllers to machines missing them once the cluster size is at least 3, increasing the number of controllers for every odd machine added