-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JMX Gatherer collection interval increases with time #926
Comments
cc: jmx-metrics component owners @breedx-splk @Mrod1598 @rmfitzpatrick @dehaansa |
Took a look at the code, and the runnable is scheduled with We could change to use As far as JVM resources or splitting into multiple gatherers, it's hard to give specific advice without understanding your systems & your custom script. Either might be helpful. If you're consistently seeing these delays, you could also reduce your collection interval to something small like 1 second, and then the collections will restart quickly after the previous one finishes, assuming they are consistently taking 30s+ to execute. |
@akats7 what are you using the determine that the collection interval is increasing periodically? The collection occurs in a single-threaded executor service's That value is also used to set the edit: Would you be able to provide some of your ~redacted script? |
I agree this wouldn't be desirable for request load issues and would also create the burden of dealing with RejectedExecutionExceptions at arbitrary points in execution. |
Hi all, thanks for the replies. Here is a groovy script similar to the ones we've been using here. Here are our session props, (+ some additional for ssl) Also, we have noticed that it looks like the issue may be caused by a memory leak, we've been tracking the memory utilization on the java process for the gatherer and see that it has been incrementally increasing and had doubled after running for a couple of days. |
We see how often the data is coming in to our observability backend, and see the interval between data points is increasing |
Glad to see we are not alone. We are seeing the same behavior, plus after 10ish days it will stop sending metrics totally. See details in #861 |
@akats7 that seems reasonable, though this behavior
is not what I would expect from the code. I know @rmfitzpatrick implemented the change you're describing in this PR #253 but it became stale, maybe it should be reopened. |
I believe this issue should be improved in my PR that addresses several other issues by refactoring how callbacks are created & executed: #949 I've personally seen the heap get GC'd successfully and not display memory leak issues, but it's possible there are other memory leaks present in the receiver that I haven't addressed. I know at least the repeated callbacks issue is no longer present in that PR. |
Awesome, thanks @dehaansa! I'll do some testing as well |
Where in the source code is this happening? |
Hey @jack-berg, I'm now also not seeing that happening... I'm a bit confused myself tbh. I'm now seeing that the callbackRegistrations list continuously grows as I originally assumed. Perhaps I modified something myself while testing locally. |
Well in general, a CallbackRegistration is created each time an async instrument callback is registered, and its not dereferenced until the resulting instrument is closed. For example:
In a normal workflow, a small number of async instruments are typically created at application start. They are never closed and observe values from the same callback function for the lifecycle of the application. I suspect jmx metric gatherer may be continuously registered new callbacks without closing old ones. |
#949 was merged and should be available in 1.29.0. This resolves the repeated callbacks issue, as well as several others, let us know if this behavior persists once the new release is available. |
@akats7 have you had a chance to try this to confirm it's resolved for you? |
This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment. |
Hey @breedx-splk, I was able to ensure that the memory leak has been resolved but have not had a chance to check that it also resolved the increase in the interval, I'm fairly sure that the leak was the cause but will validate shortly. |
Hey @akats7 (and others). It looks like this has gone stale, which I think is a pretty good indication that this is no longer an issue for folks. If it is, please feel free to reopen with new testing results, but for now I'm going to close. Thanks! |
Component(s)
jmx-metrics
What happened?
Description
Hi all, we are using the jmx gatherer with a custom groovy script that has 60-70 instruments. We configure a default interval of 30s but noticed that as the gatherer runs for a prolonged period of time, the collection interval increases in intervals of 30s (i.e, become 60s -> 90s -> 120s -> ....). We have some more complex instruments for kafka mbeans, that also create multiple metrics per rule. On some of the instances we see about 7k metrics per collection interval.
Has this behavior been observed before and has any load testing been done on the gatherer,?
We're trying to gauge what the limit should be and exploring a few solutions, if we need to adjust our jvm params or potentially run multiple gatherers with narrowed instruments in parallel.
Wanted to get the thoughts of the team, thanks.
Component version
1.26.0
Log output
N/A no additional info in logs outside of duplicate metric warnings
Additional context
No response
The text was updated successfully, but these errors were encountered: