-
Notifications
You must be signed in to change notification settings - Fork 21
Conversation
count = var.update_esxi ? 1 : 0 | ||
|
||
provisioner "local-exec" { | ||
command = "sleep 250" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sleep 250
is going to introduce flakes.
Perhaps, if we can be certain the node has already started the reboot (and SSH is down before this null_resource starts) the following would work:
resource "null_resource" "reboot_post_upgrade" {
depends_on = [null_resource.upgrade_nodes]
count = var.update_esxi ? length(metal_device.esxi_hosts) : 0
provisioner "remote-exec" {
command = "vmware -vl"
}
connection {
type = "ssh"
user = local.ssh_user
private_key = chomp(tls_private_key.ssh_key_pair.private_key_pem)
timeout = "250s"
bastion_host = metal_device.router.access_public_ipv4
host = metal_device.esxi_hosts[count.index].access_public_ipv4
# or we could use the VLAN address and do this after the L2 switch rather than before
}
}
What are your thoughts here, @empovit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm expecting timeout to act like a retry-timeout here. That may not be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@displague you're absolutely right! I didn't really like this part about the original script by Equinix (linked from "Using VMware ESXi on Equinix Metal" which doesn't seem available anymore), but didn't want to introduce too much changes at once - and this procedure looks tried and true. Plus, there are also 25f75e6 and cd35a47 I'd like to merge at some point.
I've started exploring ways to implement a health check for the two reboots (pre & post upgrade), but haven't come up with usable code yet. IMO ssh-ing to ESXi isn't enough, and the state of internal services must be checked.
The question is whether you'd like to wait for proper health-checks instead of the sleeps, or is it OK to merge the PR and improve it iteratively, especially that this feature isn't widely used.
Also, if you're rolling out 7.0 support for all machine types (I don't know, badly need it for g2.large.x86
), then maybe the upgrade feature isn't needed after all.
WDYT?
main.tf
Outdated
} | ||
} | ||
|
||
data "template_file" "upgrade_script" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The templatefile
function is preferred:
https://registry.terraform.io/providers/hashicorp/template/latest/docs/data-sources/file
main.tf
Outdated
@@ -161,6 +218,27 @@ resource "metal_port" "esxi_hosts" { | |||
} | |||
} | |||
|
|||
data "template_file" "vars_file" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The templatefile
function is preferred:
https://registry.terraform.io/providers/hashicorp/template/latest/docs/data-sources/file
(this change is replacing templatefile with a datasource template_file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some templatefile
comments so this can be revisited after #60.
You can check all ESXi update versions/filenames here: https://esxi-patches.v-front.de/ | ||
EOF | ||
type = string | ||
default = "ESXi-7.0U3d-19482537-standard" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like latest is now ESXi-7.0U3m-21686933-standard
https://esxi-patches.v-front.de/ESXi-7.0.0.html
(just noting this as I take this PR for a test drive with some of the other open PRs all merged together)
https://github.com/equinix/terraform-metal-vsphere/pull/new/empovit-prs
null_resource.apply_esx_network_config timed out after 5m
null_resource.upgrade_nodes failed with:
null_resource.upgrade_nodes[1] (remote-exec): Connection failed
null_resource.upgrade_nodes[1] (remote-exec): Failed to login: Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections.```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like I'm running into 5m timeouts with the network config resources regardless of this PR.
null_resource.apply_esx_network_config[0]
null_resource.apply_esx_network_config[1]
Thanks @displague for reviewing the PR! Unfortunately, it was created a year ago and I don't have time to troubleshoot any new problems right now. Feel free to close it, or put it on hold until I can working on it. Sorry about that :( |
- enable optionally upgrading ESXi to a newer version (e.g. 6.5->7.0, or a latest patch) before installing vCenter virtual appliance - update README accordingly - fix spelling
The upgrade code is based on https://github.com/enkelprifti98/packet-esxi-6-7