Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jmx-metrics] Collect in callback #949

Merged
merged 16 commits into from
Jul 26, 2023
Merged

[jmx-metrics] Collect in callback #949

merged 16 commits into from
Jul 26, 2023

Conversation

dehaansa
Copy link
Contributor

@dehaansa dehaansa commented Jul 11, 2023

Description:
Update the jmx metrics gatherer to collect the actual metric values being reported in the callback.

This was initially started to resolve an issue where once a target went down, the same metric values continued to report. The existing code registers its async callback(s) (as many as have been successful collections, so a potential memory leak) to report the value last collected when the groovy script(s) are executed, so when the SDK triggers its collectAll() method, those callbacks are executed and the last collected value is reported. Now, on execution of groovy scripts callbacks are registered with the MBeans that currently match the filter in the script, but the actual metric collection happens inside of the callback that is executed when the SDK goes to export.

Existing Issue(s): #910
May be relevant to several other issues including memory leaks recently reported & issues with duplicate data errors reporting.

Testing: Local testing against JVM targets. Trying to wrap my head around adding unit tests that would cover this scenario, but as long as the integration tests work that should cover the behavior.

Have validated locally that the following situations work as expected

  • metrics gatherer starts with target unavailable, then target becomes available. No metrics collected initially, once target comes available metrics flow as expected
  • metrics gatherer starts with target available, target becomes unavailable, then later becomes available again. Metrics collected initially, collection returns no metrics while target is unavailable, then metrics collected as expected again once target is again available

Documentation:

Should not be a user-facing change that would require documentation. The target of this refactor was to work with the existing JMX scripts without any issues.

Outstanding items:

  • Add thorough comments to the new code

@dehaansa dehaansa marked this pull request as ready for review July 11, 2023 05:26
@dehaansa dehaansa requested a review from a team July 11, 2023 05:26
// Look at the first mbean collected to evaluate if the attributes requested are
// composite data types or simple. Composite types require different parsing to
// end up with multiple recorders in the same callback.
def bean = mbeans.first()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure it's guaranteed the first will be reflective of the rest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not guaranteed 100%, but it would break metric conventions if some mbeans in a group were composite data and some were not (IE you would have jvm.memory.pool metrics & the jvm.memory.pool.(committed|used|etc) metrics if that instrument contained some composite & some simple.

We could add some complexity in order to split mbeans by type, or maybe we just log an error and bail if not all of the mbeans in an instrument match?

Copy link
Contributor

@rmfitzpatrick rmfitzpatrick Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it's better for the aggregation behavior of the current update implementation to be preserved where it's splitting by per-mbean value.

def value = (CompositeData) mbeans.first().getProperty(attribute)
compositeAttributes.add(new Tuple2<String, Set<String>>(attribute, value.getCompositeType().keySet()))

Here we get the value from the first mbean and then limit the CompositeType* and CompositeData key set for all other mbeans. Given how relaxed the mbean query can be, and how general CompositeData values can be I don't think this is a valid restriction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. As the batch callback needs to register all of its expected instruments, this might be a little complex, will take a pass at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is improved, let me know!

…s/GroovyMetricEnvironment.java

Co-authored-by: Ryan Fitzpatrick <10867373+rmfitzpatrick@users.noreply.github.com>
Consumer<ObservableDoubleMeasurement> existing = existingUpdater.get();
existing.accept(doubleResult);
};
private <T extends ObservableMeasurement> T registerCallback(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helped a lot with readability.

@akats7
Copy link
Contributor

akats7 commented Jul 19, 2023

Hey @dehaansa, I tested this change under the same conditions that was causing the rampant heap usage and looks like this fully resolved the issue.

} catch (AttributeNotFoundException ignored) {
logger.fine("Attribute ${attribute} not found on any of the collected mbeans")
} catch (InvalidAttributeValueException ignored) {
logger.info("Attribute ${attribute} was not consistently CompositeData for " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may warrant a ~breaking change changelog entry or readme update in the chance that users run into it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note to the readme, from what I can see it looks like changelog entries are added outside the scope of the PR introducing the change. If I'm wrong though, I'm happy to add whatever's required for a changelog entry.

Copy link
Contributor

@rmfitzpatrick rmfitzpatrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇 thanks!

@trask trask added this to the v1.29.0 milestone Jul 19, 2023
jmx-metrics/README.md Outdated Show resolved Hide resolved
Co-authored-by: jason plumb <75337021+breedx-splk@users.noreply.github.com>
@dehaansa
Copy link
Contributor Author

@trask I see this was tagged for release, anything left that it needs to be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants