-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resiliency for ArmadaOperator services calls #2591
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## master #2591 +/- ##
===========================================
- Coverage 58.78% 34.38% -24.41%
===========================================
Files 238 112 -126
Lines 30568 3467 -27101
Branches 0 470 +470
===========================================
- Hits 17970 1192 -16778
+ Misses 11223 2192 -9031
+ Partials 1375 83 -1292
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -21,6 +21,7 @@ | |||
from armada_client.k8s.io.api.core.v1 import generated_pb2 as core_v1 | |||
from armada_client.permissions import Permissions | |||
from armada_client.typings import JobState | |||
import tenacity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you will need to add tenacity to our pyproject.toml file as this is a new dependency.
You will also need to run the python code formatter (Black via tox -e format-code
) to get your code to pass CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove tenacity from the client? Not sure I like this change being enforced on all users of the client.
I’d leave the changed in jobservice. Can you also experiment with deploying jobservice with multiple replicas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt get you but in the comments of the issue , It was mentioned that we need to include backoff for all the clients(ArmadaClient & JobServiceClient).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for removing the client side. The main user of this is mostly for jobservice and I'm not sure we want to add retries to all users of the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kannon92 , I added the test for tenacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good. Maybe you can work on having replicas for jobservice in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to discuss and decide if we want the behavior of JobServiceClient
and JobServiceAsyncIOClient
to majorly diverge as that's the effect of this PR as it stands. I haven't looked at tenacity
yet. Does it support asyncio retry as well? If so, can you add this to JobServiceAsyncIOClient
as well?
Hey @ClifHouck , I found out that tenacity works fine with async functions as well , So I added the functionality for JobServiceAsyncIOClient as well. |
@@ -18,6 +20,11 @@ class JobServiceClient: | |||
def __init__(self, channel): | |||
self.job_stub = jobservice_pb2_grpc.JobServiceStub(channel) | |||
|
|||
@tenacity.retry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love some kind of test that can verify that this library works with GRPC.
Can you look into adding a unit test? You could use our mock python grpc class for jobservice.
Otherwise this is looking really good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test should check whether tenacity makes retries or not , right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
Hey @pavan12395 We got a request form our user for this feature to go a different way. We are going to close this PR. Nice work on it so far! We decided to go with GRPC retries rather than tenacity. |
This Pull Request makes the following changes
Fixes issue #2483
┆Issue is synchronized with this Jira Task by Unito