-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suggested tweaks to observability setup #475
Conversation
ObservationRegistry testObservationRegistry() { | ||
return TestObservationRegistry.create(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This replaces the default ObservationRegistry
bean with a TestObservationRegistry
inside this test, instead of keeping the default ObservationRegistry
bean and adding an additional (non-bean) TestObservationRegistry
.
@@ -64,16 +78,14 @@ void beforeAll() { | |||
.thenThrow(new RuntimeException("test failed via jobDao.fail()")); | |||
when(jobDao.fail(any(), anyString())) | |||
.thenThrow(new RuntimeException("test failed via jobDao.fail()")); | |||
// set up metrics registries | |||
testObservationRegistry = TestObservationRegistry.create(); | |||
Metrics.globalRegistry.add(meterRegistry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this setup is not necessary; Spring handles it for us once we make TestObservationRegistry
a Spring-managed bean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad that creating a TestObservationRegistry bean does get Metrics to work (it didn't in my implementation, hence why I had to create a test and a regular observation registry)
meterRegistry.clear(); | ||
Metrics.globalRegistry.clear(); | ||
((TestObservationRegistry) observationRegistry).clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously, this cleared everything after all tests had run. There's no point to that, as Spring will destroy the context after the whole test finishes anyway.
Instead, I am now clearing everything before each test; this prevents metrics recorded in one test bleeding into the metrics being read by another test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the beforeeach method on thursday but i think I was missing the TestObservationRegistry bean as the missing piece, thank you
// since the job we just ran is complete, we expect it to show zero active tasks | ||
// and zero duration. | ||
assertEquals(0, longTaskTimer.activeTasks()); | ||
assertEquals(0, longTaskTimer.duration(TimeUnit.SECONDS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously, this test asserted that longTaskTimer.duration
was nonzero. The fact it was nonzero seems to be an artifact of recording the same observation twice to the same meterRegistry. Since this is the "*active" metric, it should only be recording currently-in-flight observations, and we should expect it to be zero. Once I cleared up the double-recording, it is now zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why we had a count of 2 after executing a job once. Spring already connects the ObservationRegistry
to the MeterRegistry
. But here, we were connecting it a second time … and thus, we were recording each observation twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dang, introducing this config was my suggestion...I didn't realize Spring was already providing this exact thing out of the box.
I think this perhaps remains a valuable reference snippet for how we might want to customize the observation config when we have a reason to do so, so I wanted to ask to check my own comprehension:
If we just leave the Config class in place and only delete line 23 (.observationHandler(...)
), would it have the same/desired effect as dropping the whole file? I imagine wanting to re-introduce this class at some point for the purpose of overriding config, but don't want to cause a regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tricky!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jladieu my $.02 would be to look back at this PR, or the PR that introduced this code, if we need a reference in the future instead of keeping it in the runtime codebase but doing nothing. That said, yeah, I think just removing the .observationHandler(...)
line would have the same effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice sleuthing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dang, introducing this config was my suggestion...I didn't realize Spring was already providing this exact thing out of the box.
I think this perhaps remains a valuable reference snippet for how we might want to customize the observation config when we have a reason to do so, so I wanted to ask to check my own comprehension:
If we just leave the Config class in place and only delete line 23 (.observationHandler(...)
), would it have the same/desired effect as dropping the whole file? I imagine wanting to re-introduce this class at some point for the purpose of overriding config, but don't want to cause a regression.
|
||
// override Spring's standard ObservationRegistry with a TestObservationRegistry | ||
@TestConfiguration | ||
static class TestObservationConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider making this config universal for all tests? If that's possible, it might be nice to provide an accessor method for the global TestObservationRegistry
that tests can use instead of having to @Autowire ObservationRegistry
and typecast anywhere that's needed. The typecasting feels clumsy and it'd be nice if it wasn't necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately making this config universal for all tests isn't super easy. I've moved the static inner class to a reusable standalone class … but it still requires any test that needs it to include it via @Import
and to perform the appropriate casting.
fwiw, I kinda like the inclusion of
// this test requires a TestObservationRegistry
TestObservationRegistry testObservationRegistry =
assertInstanceOf(TestObservationRegistry.class, observationRegistry);
in the unit test, as it ensures the unit test is set up correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for digging into this David - and the easy explanation of what you found through all the comments.
Quality Gate passedKudos, no new issues were introduced! 0 New issues |
@ashanhol feel free to merge this at will; I didn't want to merge directly into your branch myself. |
return observationRegistry -> | ||
observationRegistry | ||
.observationConfig() | ||
.observationHandler(new DefaultMeterObservationHandler(meterRegistry)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm a little confused. Without explicitly defining the DefaultMeterObservationHandler in the normal ObservationConfig, i noticed the timers were coming out 0 in my manual testing pulling the prometheus dump, even though theoretically a default one is supposed to be created. Was that not a problem for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the timers were coming out 0
which timer, specifically? My understanding is that the wds.job.execute
timer tracks historical data and persists timing info after the job is done. But, wds.job.execute.active
- note the "active" suffix - is only supposed to track jobs that are currently in flight. So, the wds.job.execute.active
timer should be zero unless you catch it at exactly the right time, when a job has started but not yet completed.
When I ran manually via bootRun
, I saw the expected results in the prometheus scrape API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah shoot it was active seconds so yeah I went out of my way to give wrong results. I had a really hard time finding definitions for all the metrics (timers, max, etc), how did you find clarification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed the definition when reading the javadoc/comments for LongTaskTimer
. I was clicking around in IntelliJ but you can see the comments here, too: https://www.javadoc.io/doc/io.micrometer/micrometer-core/latest/io/micrometer/core/instrument/LongTaskTimer.html
A long task timer is used to track the total duration of all in-flight long-running tasks and the number of such tasks.
"in-flight" was the key description.
Note this is a PR to #461, not to
main
See my comments inline for explanation.