-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add async observability to QuartzJob #461
Conversation
service/src/main/java/org/databiosphere/workspacedataservice/dataimport/pfb/PfbQuartzJob.java
Outdated
Show resolved
Hide resolved
service/src/main/java/org/databiosphere/workspacedataservice/jobexec/QuartzJob.java
Outdated
Show resolved
Hide resolved
service/src/main/java/org/databiosphere/workspacedataservice/metrics/MetricsDefinitions.java
Outdated
Show resolved
Hide resolved
service/src/main/java/org/databiosphere/workspacedataservice/metrics/MetricsDefinitions.java
Outdated
Show resolved
Hide resolved
service/src/main/java/org/databiosphere/workspacedataservice/metrics/MetricsDefinitions.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/databiosphere/workspacedataservice/dataimport/pfb/PfbQuartzJobE2ETest.java
Outdated
Show resolved
Hide resolved
...ce/src/test/java/org/databiosphere/workspacedataservice/dataimport/pfb/PfbQuartzJobTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few comments/suggestions but nothing showstopper. LGTM with pre-emptive approval!
service/src/main/java/org/databiosphere/workspacedataservice/jobexec/QuartzJob.java
Outdated
Show resolved
Hide resolved
service/src/main/java/org/databiosphere/workspacedataservice/jobexec/QuartzJob.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question inline, otherwise looking good!
I see this is failing unit tests; it needs some updates for that.
Can you write a test to ensure that QuartzJob
is properly recording metrics? If you pass a @MockBean ObservationRegistry observationRegistry
(or a @SpyBean
), can you verify in the mock/spy that it's doing the right thing?
service/src/main/java/org/databiosphere/workspacedataservice/service/RecordService.java
Outdated
Show resolved
Hide resolved
service/src/main/java/org/databiosphere/workspacedataservice/jobexec/QuartzJob.java
Outdated
Show resolved
Hide resolved
This intends to make the default `ObservationRegistry` automatically wired up to the same `MeterRegistry` provided by springboot.
@ashanhol I saw your question about whether we might need to customize Regarding your question in Slack about whether/how to test this, I think if you merge the PR mentioned here you should be able to adjust |
service/src/test/java/org/databiosphere/workspacedataservice/jobexec/QuartzJobTest.java
Outdated
Show resolved
Hide resolved
service/src/test/java/org/databiosphere/workspacedataservice/jobexec/QuartzJobTest.java
Show resolved
Hide resolved
service/src/test/java/org/databiosphere/workspacedataservice/service/RecordServiceTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments inline, mostly about tests. Good stuff in this PR!
service/src/main/java/org/databiosphere/workspacedataservice/jobexec/QuartzJob.java
Outdated
Show resolved
Hide resolved
Observation.start("wds.job.execute", observationRegistry) | ||
.contextualName("job-execution") | ||
.lowCardinalityKeyValue("jobType", getClass().getSimpleName()) | ||
.highCardinalityKeyValue("jobId", jobId.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How feasible is it to start an Observation
in a separate class - and a separate Java thread - and update the Observation
here?
We queue the job over in ImportService.java:
Line 85 in c6eb725
// schedule the job. after successfully scheduling, mark the job as queued |
and if we wanted to track timing between QUEUED
and RUNNING
status, we'd need to start the Observation over there.
I am totally fine answering this question sometime after this PR merges. What's in this PR as-is is useful and adds value, so it's worth it on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call @davidangb this is actually the spirit behind the measurement, which is to know how long the customer has to wait, so I'm in favor of this. Ok to defer to another PR though if you just want to land this one @ashanhol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had this exact thought when doing this, but couldn't quite figure out how to "pass" an observation from one class to another. I'm happy to make a ticket to think about it further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
service/src/test/java/org/databiosphere/workspacedataservice/jobexec/QuartzJobTest.java
Show resolved
Hide resolved
service/src/test/java/org/databiosphere/workspacedataservice/jobexec/QuartzJobTest.java
Show resolved
Hide resolved
service/src/test/java/org/databiosphere/workspacedataservice/jobexec/QuartzJobTest.java
Outdated
Show resolved
Hide resolved
service/src/test/java/org/databiosphere/workspacedataservice/jobexec/QuartzJobTest.java
Outdated
Show resolved
Hide resolved
service/src/test/java/org/databiosphere/workspacedataservice/jobexec/QuartzJobTest.java
Show resolved
Hide resolved
service/src/test/java/org/databiosphere/workspacedataservice/jobexec/QuartzJobTest.java
Show resolved
Hide resolved
service/src/test/java/org/databiosphere/workspacedataservice/jobexec/QuartzJobTest.java
Show resolved
Hide resolved
.hasBeenStarted() | ||
.hasBeenStopped(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a test for a failing job that looks for .hasLowCardinalityKeyValue("outcome", "failure")
? You probably need to modify TestableQuartzJob
or create something new that extends QuartzJob
but throws an error in executeInternal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'm losing my mind here. In this current test implementation I have the execute function wrapped in a assertThrows
, and it fails because it says it doesn't throw anything. But if I get rid of assertThrows
it totally does. I feel like I'm missing something obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK @jladieu and I figured this out, and it's confusing. We have two JobExecutionExceptions, one from Quartz and one we define. The one we define extends RuntimeException and therefore doesn't need to be caught/handled, but the quartz one does. QuartzJob.execute throws a org.quartz.JobExecutionException (and therefore needs the throws org.quartz.JobExecutionException
in the method signature), but all our custom jobs throw various RuntimeException ones. So I had the right idea having the TestQuartzJob throw our custom one but it made writing this test VERY confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this explanation helps to explain …
The typical data flow is:
- Quartz looks for an
execute()
method and runs that method when it kicks off a job - in the WDS codebase, we define the
execute()
method inQuartzJob
. QuartzJob.execute()
wraps and calls theexecuteInternal()
method defined in subclasses such asTdrManifestQuartzJob
orPfbQuartzJob
- As part of its wrapping,
QuartzJob.execute()
!!catches any exceptions thrown byexecuteInternal
!!. On catch, it marks the job as failed and swallows the exception so Quartz can mark its job as complete.
In this test:
- the test defines a
TestableQuartzJob
subclass ofQuartzJob
- the test throws an exception !!from
executeInternal()
!! - the
QuartzJob.execute()
wrapper properly catches the error thrown fromexecuteInternal()
, marks the job as failed, registers metrics, and swallows the error.
So, it's expected that when this test calls TestableQuartzJob(...).execute
, it will not throw an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I figured it out eventually. I personally didn't know that RuntimeException had different behavior than regular Exception and that required some compiler fighting and extra confusion with the namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #475 for my requested changes to prevent double-counting of metrics!
observability tweaks to remove observationRegistry bean and add testobservationRegistry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one nit/suggestion inline, giving this a proactive 👍
service/src/test/java/org/databiosphere/workspacedataservice/jobexec/QuartzJobTest.java
Outdated
Show resolved
Hide resolved
Co-authored-by: David An <davidan@broadinstitute.org>
Quality Gate passedKudos, no new issues were introduced! 0 New issues |
Added Observability to QuartzJob.execute denoting running, failure, and succeed states. The
DefaultMeterObservationHandler
gives us a built in Timer, Counter, and LongTaskTimer.Here's how it reports in Prometheus.
Reminder:
PRs merged into main will not automatically generate a PR in https://github.com/broadinstitute/terra-helmfile to update the WDS image deployed to kubernetes - this action will need to be triggered manually by running the following github action: https://github.com/DataBiosphere/terra-workspace-data-service/actions/workflows/tag.yml. Dont forget to provide a Jira Id when triggering the manual action, if no Jira ID is provided the action will not fully succeed.
After you manually trigger the github action (and it completes with no errors), you must go to the terra-helmfile repo and verify that this generated a PR that merged successfully.
The terra-helmfile PR merge will then generate a PR in leonardo. This will automerge if all tests pass, but if jenkins tests fail it will not; be sure to watch it to ensure it merges. To trigger jenkins retest simply comment on PR with "jenkins retest".