Skip to content
This repository has been archived by the owner on Aug 23, 2024. It is now read-only.

add ESXi upgrade option #55

Merged
merged 1 commit into from
Jun 7, 2024
Merged

add ESXi upgrade option #55

merged 1 commit into from
Jun 7, 2024

Conversation

empovit
Copy link
Contributor

@empovit empovit commented May 12, 2022

  • enable optionally upgrading ESXi to a newer version (e.g. 6.5->7.0, or a latest patch) before installing vCenter virtual appliance
  • update README accordingly
  • fix spelling

The upgrade code is based on https://github.com/enkelprifti98/packet-esxi-6-7

count = var.update_esxi ? 1 : 0

provisioner "local-exec" {
command = "sleep 250"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep 250 is going to introduce flakes.

Perhaps, if we can be certain the node has already started the reboot (and SSH is down before this null_resource starts) the following would work:

resource "null_resource" "reboot_post_upgrade" {
  depends_on = [null_resource.upgrade_nodes]
  count      = var.update_esxi ? length(metal_device.esxi_hosts) : 0
  provisioner "remote-exec" {
    command  = "vmware -vl"
  }
  connection {
    type         = "ssh"
    user         = local.ssh_user
    private_key  = chomp(tls_private_key.ssh_key_pair.private_key_pem)
    timeout      = "250s"
    bastion_host = metal_device.router.access_public_ipv4
    host         = metal_device.esxi_hosts[count.index].access_public_ipv4
    # or we could use the VLAN address and do this after the L2 switch rather than before 
  } 
}

What are your thoughts here, @empovit ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting timeout to act like a retry-timeout here. That may not be the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@displague you're absolutely right! I didn't really like this part about the original script by Equinix (linked from "Using VMware ESXi on Equinix Metal" which doesn't seem available anymore), but didn't want to introduce too much changes at once - and this procedure looks tried and true. Plus, there are also 25f75e6 and cd35a47 I'd like to merge at some point.

I've started exploring ways to implement a health check for the two reboots (pre & post upgrade), but haven't come up with usable code yet. IMO ssh-ing to ESXi isn't enough, and the state of internal services must be checked.

The question is whether you'd like to wait for proper health-checks instead of the sleeps, or is it OK to merge the PR and improve it iteratively, especially that this feature isn't widely used.

Also, if you're rolling out 7.0 support for all machine types (I don't know, badly need it for g2.large.x86), then maybe the upgrade feature isn't needed after all.

WDYT?

main.tf Outdated
}
}

data "template_file" "upgrade_script" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main.tf Outdated
@@ -161,6 +218,27 @@ resource "metal_port" "esxi_hosts" {
}
}

data "template_file" "vars_file" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The templatefile function is preferred:

https://registry.terraform.io/providers/hashicorp/template/latest/docs/data-sources/file

(this change is replacing templatefile with a datasource template_file)

Copy link
Member

@displague displague left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some templatefile comments so this can be revisited after #60.

You can check all ESXi update versions/filenames here: https://esxi-patches.v-front.de/
EOF
type = string
default = "ESXi-7.0U3d-19482537-standard"
Copy link
Member

@displague displague May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like latest is now ESXi-7.0U3m-21686933-standard
https://esxi-patches.v-front.de/ESXi-7.0.0.html

(just noting this as I take this PR for a test drive with some of the other open PRs all merged together)

https://github.com/equinix/terraform-metal-vsphere/pull/new/empovit-prs

null_resource.apply_esx_network_config timed out after 5m

null_resource.upgrade_nodes failed with:

null_resource.upgrade_nodes[1] (remote-exec): Connection failed
null_resource.upgrade_nodes[1] (remote-exec): Failed to login: Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections.```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like I'm running into 5m timeouts with the network config resources regardless of this PR.

null_resource.apply_esx_network_config[0]
null_resource.apply_esx_network_config[1]

@empovit
Copy link
Contributor Author

empovit commented May 9, 2023

Thanks @displague for reviewing the PR! Unfortunately, it was created a year ago and I don't have time to troubleshoot any new problems right now. Feel free to close it, or put it on hold until I can working on it. Sorry about that :(

- enable optionally upgrading ESXi to a newer version (e.g. 6.5->7.0,
  or a latest patch) before installing vCenter virtual appliance
- update README accordingly
- fix spelling
@displague displague merged commit e1265a4 into equinix:main Jun 7, 2024
1 of 2 checks passed
@empovit empovit deleted the update-esxi branch June 17, 2024 08:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants