Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: 1568 Fixing S3 Connection Building #1580

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

irishgordo
Copy link
Contributor

  • fixes s3 connection building, allowing us to wait for a short period of time to allow the secret for longhorn-system to be built on Harvester side

Resovles: fix/1568
See also: #1568

Which issue(s) this PR fixes:

Fixes #1568

What this PR does / why we need it:

Adds a bit of time before yielding back the backup config -> as Harvester takes a bit more time usually for it to have the secret built in longhorn-system in v1.4.0-rc1

Additional documentation or context

In the video, it is demo'd with an external bare-metal hp dl160 2 node v1.4.0-rc1 cluster that we can indeed replicate the issue outlined in 1568. We see we are moving to fast trying to build the fixture and the secret hasn't yet built on Harvester side in time. But if we give it a bit of a sleep, to allow Harvester to catch up to build the secret before yielding back the spec, everything is good. The video illustrates the points back to back.

s3-connection-building-issue.mp4

* fixes s3 connection building, allowing us to wait for a short period
  of time to allow the secret for longhorn-system to be built on
  Harvester side

Resovles: fix/1568
See also: harvester#1568
@@ -121,6 +121,8 @@ def config_backup_target(api_client, conflict_retries, backup_config, wait_timeo
f'Failed to update backup target to {backup_type} with {config}\n'
f"API Status({code}): {data}"
)
# sleeping to allow longhorn secret to be built on Harvester side
sleep(5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any way of checking the secret existence? Can we check that in a loop with some sleep?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we loop through checking the status of the backup target by looping through something like api_client.settings.backup_target_test_connection? Will that return true if the secret isn't there?

@irishgordo irishgordo marked this pull request as draft October 22, 2024 17:50
@irishgordo
Copy link
Contributor Author

I've converted this back to "draft" as the fix will require checking a status on Longhorn that @lanfon72 was mentioning .
Sleeping is a band-aid -> and really this is most noticable on a loadout of:

  1. A seperate located MinIO (S3 Compatible) Instance, running on a VM, "not" on MinIO Operator based Tenant (so no K8s helm chart based install of MinIO) -> located over a 1GB Symetrical Backbone Network and is backed by SSD not NVMe
  2. A seperate Harvester Cluster, running on bare-metal, "that does not run" the MinIO instance -> backed by a 1GB Symetrical Backbone Network and is backed by SSD not NVMe
  3. Running these tests on a seperate host/workstation, that is also backed by a 1GB Symetrical Backbone Network

It's not that this test is "failing" per say due to the test itself it's the hardware + network based setup that is impacting this test.

When running locally or over faster storage + network -> test failures happens less frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TEST] TestBackupRestore::test_connection[S3] test fails making consecutive tests fail for backup & restore
3 participants