Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have archivetar not immediately fail if Globus is unavailable temporarily #42

Open
brockpalen opened this issue Jan 23, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@brockpalen
Copy link
Owner

Currently if globus fails like how httpd dies sometimes, archivetar will immediately fail.

Desired outcome would be some period of time it could retry before giving up.

eg:

Unable to connect to <>:443\\nglobus_xio: System error in connect: Connection refused\\nglobus_xio: A system call failed: Connection refused\\n\n", 'eHotMF73v')
@brockpalen brockpalen added the enhancement New feature or request label Jan 23, 2024
@brockpalen
Copy link
Owner Author

I'm looking at https://github.com/jd/tenacity to implement some retries. Also, have an issue open with the Globus team to see if they have anything built-in or a best practice.

@brockpalen brockpalen self-assigned this Feb 6, 2024
@brockpalen
Copy link
Owner Author

From the Globus team:

The SDK supports timeout and retry customization via the client's .transport attribute, which is an instance of the RequestsTransport class [documentation link].

There are several customization options exposed as attributes, but I think that the following will be helpful in this situation:

    .TRANSIENT_ERROR_STATUS_CODES
    .retry_backoff()
    .max_retries

Looking at the archivetar code, it may be that code like this will accommodate longer retries, and enforce retries on HTTP 404:

# After instantiating the TransferClient
# --------------------------------------

# Add HTTP 404 as a status code that should be retried.
self.tc.transport.TRANSIENT_ERROR_STATUS_CODES += (404, )

# Retry once per second, without any backoff.
self.tc.transport.retry_backoff = lambda *_, **__: 1.0

# Allow up to 100 retries.
# This may result in more than 2 minutes of retries.
self.tc.transport.max_retries = 100

This will result in several minutes of retries before an exception is raised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant