-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload cancellation causes long retry loop #3268
Comments
I can attach full logs from actual execution of that script with and without signal handler, each about 10MB in size. |
Thanks for reaching out. Can you share the full logs (with any sensitive info redacted) by adding I'll link the Boto3 docs on retries and timeout for Config for reference, although I see you already have those applied in your snippet. I'm wondering if specifying a lower timeout and |
Attaching the log with debug output: log-exc5-handler.log.gz Reducing the delay is not my goal, the retries are configured reasonably in our case and there is no reason to change those. What I want is for transfer to stop instantly without any delay after the signal is received and exception is raised, there is no reason to retry in that case. Hitting Ctrl-C at a correct moment means that moment should happen when upload has started but before it has finished, i.e. when the threads are actively pushing the data. That window may be short depending on the size of the file. I used ~100MB file in my case which gave me relatively good chance. |
Thanks for following up. It looks like boto/s3transfer#249 may be related. We can plan to bring this issue up with the team for further review. |
Describe the bug
I'm not exactly sure if this is
botocore
ors3transfer
issue, more probably some inconsistency between the two. What I observe is that when there is an asynchronous exception (Control-C or similar "random" exception that we use to terminate the application) during S3 upload, the upload does not stop immediately, but instead it hangs for longer than a minute. This happens with multithreaded upload, single-threaded upload does not show the same issue.Regression Issue
Expected Behavior
I would expect that Control-C or other random exception should stop upload almost immediately.
Current Behavior
Running with debug logging enabled I think I see what happens:
__exit__
method is called which callsTransferManager.__exit__()
TransferManager._shutdown
with exception typeFatalError
orCancelledError
depending on the type of original exception.cancel()
method passing the exception type to them.Here is an example of traceback dumped by boto from one of the threads when it goes into a retry loop:
Reproduction Steps
Here is the code that I used to debug the issue, the code is actually a trivial call to
file_upload
method. The trick is to trigger upload cancellation by hitting Ctrl-C at a correct moment. In our setup I used ~100MB file which gave me a reasonable time window during the upload. If you are lucky that after hitting Ctrl-C you should see enormous amount of tracebacks printed like the one I included above.Possible Solution
Maybe retry handler should understand that certain kinds of exceptions should result in immediate failure?
Additional Information/Context
Package versions:
SDK version used
1.34.131
Environment details (OS name and version, etc.)
Red Hat Enterprise Linux release 8.6 (Ootpa), kernel 4.18.0-372.32.1.el8_6.x86_64
The text was updated successfully, but these errors were encountered: