Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reduce warm up time #72

Merged
merged 14 commits into from
Oct 23, 2023
Merged

fix: reduce warm up time #72

merged 14 commits into from
Oct 23, 2023

Conversation

rogerthatdev
Copy link
Contributor

@rogerthatdev rogerthatdev commented Jul 28, 2023

Description

This PR:

  • will reduce the provisioning time by replacing the time_sleep resource with a 6 minute timeout using the http data source
  • change it so that google_compute_global_address waits for the compute api to be enabled (lack of consistently caused failure in local development terraform apply)

The http data source will send a request to the end point every 20 seconds until it gets a valid response. It will stop trying after 17 times, totaling to a 6 minute timeout (equal to the time_sleep it is replacing).

Checklist

  • Added steps to reproduce the changes in this pull request
  • Added relevant testing in this pull request
  • Please merge this PR for me once it is approved.

@rogerthatdev rogerthatdev marked this pull request as ready for review July 28, 2023 21:24
infra/main.tf Outdated
@@ -295,7 +298,7 @@ resource "time_sleep" "load_balancer_warm_up_time" {
google_compute_global_forwarding_rule.http
]

create_duration = "370s"
create_duration = "150s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate the use of data collection to identify a more optimal number; however, I have found these kinds of propagation delays can vary widely based on regional traffic, systems maintenance, and so on.

If this data was gathered during a single work session, it may be a best case scenario. Is this faster than product readiness guarantees?

Instead of hard-coding this number, might we use https://registry.terraform.io/providers/hashicorp/http/latest/docs/data-sources/http with retries enabled (for 5xx responses) to detect when the forwarding rule is working as expected? (I'm not sure if 5xx errors are returned before it's ready)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered that! Thanks, i'll take a look at that suggestion. Converting this PR to a draft because of course the int test failed at resolving the URL 👎🏽

@rogerthatdev rogerthatdev marked this pull request as draft July 28, 2023 21:48
@rogerthatdev rogerthatdev marked this pull request as ready for review July 28, 2023 23:09
@rogerthatdev rogerthatdev marked this pull request as draft August 1, 2023 18:10
@rogerthatdev rogerthatdev marked this pull request as ready for review August 1, 2023 20:03
Copy link
Contributor

@grayside grayside left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with comments. I've had this in mind as a theory for a few weeks, thanks for showing an implementation!


data "http" "load_balancer_warm_up" {
url = "http://${google_compute_global_address.default.address}/"
# Attempt retry every 20 seconds 17 times, totaling to a 6 minute timeout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why 6 minutes? Depending on the answer, that may be good to add to the inline comment because it would help future maintainers understand how to repeat the reasoning if the number should be changed in the future.

Comment on lines +303 to +307
retry {
attempts = 17
max_delay_ms = 20000
min_delay_ms = 20000
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussion: Was curious if there were any surprises here, I think these parameters are likely being used with https://github.com/hashicorp/go-retryablehttp/blob/v0.7.4/client.go#L520C23-L520C23. Based on my reading, min_delay_ms would not normally be linear, but setting the max to the same value caps the exponential computation.

No op here, just thought that may be interesting if others haven't delved that far.

depends_on = [
google_compute_global_address.default,
google_compute_url_map.default,
google_compute_backend_service.default,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: google_compute_url_map.default should have an implicit dependency on google_compute_backend_service.default if you want one fewer explicit dependency here, but clarifying we're waiting on all the load balancer resources may be a good thing given how complicated this is.

Alright, I've talked myself out of this being a suggestion and made it a "note".

@@ -16,7 +16,7 @@

output "frontend_url" {
description = "IP address to site. Load balancer expected to take ~5 minutes for it to warm up."
value = "http://${google_compute_global_address.default.address}/"
value = data.http.load_balancer_warm_up.url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Nice simplification of the output configuration based on the http data.

@rogerthatdev rogerthatdev merged commit e7d7677 into main Oct 23, 2023
8 checks passed
@rogerthatdev rogerthatdev deleted the timesleeps branch October 23, 2023 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants