Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConcurrentModificationException when late meter filters are added #5489

Closed
mabartos opened this issue Sep 24, 2024 · 12 comments · Fixed by #5490
Closed

ConcurrentModificationException when late meter filters are added #5489

mabartos opened this issue Sep 24, 2024 · 12 comments · Fixed by #5490
Labels
bug A general bug module: micrometer-core An issue that is related to our core module
Milestone

Comments

@mabartos
Copy link
Contributor

mabartos commented Sep 24, 2024

Describe the bug
In Keycloak, we're facing an issue (keycloak/keycloak#33246) that the server is unable to start when metrics are enabled - we use Quarkus under the hood with the Quarkus Micrometer extension.

It started to happen after the Micrometer 1.13.x upgrade on the Quarkus side.
It happens only sometimes, and is reproducible after multiple start/stops of Keycloak (Quarkus-based) server.

The issue is probably touching the addition of meter filters after the creation of Meters described here: #4920

The related log:

2024-09-24 15:24:07,330 INFO  [org.jgroups.protocols.FD_SOCK2] (ForkJoinPool.commonPool-worker-1) server listening on *.34764
2024-09-24 15:24:07,389 INFO  [org.jgroups.protocols.pbcast.GMS] (ForkJoinPool.commonPool-worker-1) mabartos-12454: no members discovered after 55 ms: creating cluster as coordinator
2024-09-24 15:24:07,406 INFO  [org.infinispan.CLUSTER] (ForkJoinPool.commonPool-worker-1) ISPN000094: Received new cluster view for channel ISPN: [mabartos-12454|0] (1) [mabartos-12454]
2024-09-24 15:24:07,489 INFO  [org.infinispan.CLUSTER] (ForkJoinPool.commonPool-worker-1) ISPN000079: Channel `ISPN` local address is `mabartos-12454`, physical addresses are `[10.202.149.163:50300]`
2024-09-24 15:24:07,901 DEBUG [io.micrometer.common.util.internal.logging.InternalLoggerFactory] (main) Using SLF4J as the default logging framework
2024-09-24 15:24:07,903 DEBUG [io.quarkus.micrometer.runtime.MicrometerRecorder] (main) Discovered global MeterFilters : [io.micrometer.core.instrument.config.MeterFilter$9@15311af2, io.micrometer.core.instrument.config.MeterFilter$9@796434a3, io.micrometer.core.instrument.config.MeterFilter$9@1626612d, io.micrometer.core.instrument.config.MeterFilter$9@542aba71]
2024-09-24 15:24:07,904 DEBUG [io.quarkus.micrometer.runtime.MicrometerRecorder] (main) Discovered MeterFilters for class io.micrometer.prometheus.PrometheusMeterRegistry: []
2024-09-24 15:24:07,904 DEBUG [io.quarkus.micrometer.runtime.MicrometerRecorder] (main) Discovered global MeterRegistryCustomizer : []
2024-09-24 15:24:07,905 DEBUG [io.quarkus.micrometer.runtime.MicrometerRecorder] (main) Configuring Micrometer registries : [class io.micrometer.prometheus.PrometheusMeterRegistry]
2024-09-24 15:24:07,905 DEBUG [io.quarkus.micrometer.runtime.MicrometerRecorder] (main) Discovered MeterRegistryCustomizer for class io.micrometer.prometheus.PrometheusMeterRegistry: []
2024-09-24 15:24:07,907 INFO  [com.arjuna.ats.jbossatx] (main) ARJUNA032014: Stopping transaction recovery manager
ERROR: Failed to start server in (production) mode
Error details:
java.util.ConcurrentModificationException
	at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1605)
	at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1628)
	at java.base/java.util.AbstractCollection.addAll(AbstractCollection.java:337)
	at io.micrometer.core.instrument.MeterRegistry$Config.meterFilter(MeterRegistry.java:832)
	at io.quarkus.micrometer.runtime.MicrometerRecorder.applyMeterFilters(MicrometerRecorder.java:233)
	at io.quarkus.micrometer.runtime.MicrometerRecorder.configureRegistries(MicrometerRecorder.java:89)
	at io.quarkus.deployment.steps.MicrometerProcessor$configureRegistry1996010268.deploy_0(Unknown Source)
	at io.quarkus.deployment.steps.MicrometerProcessor$configureRegistry1996010268.deploy(Unknown Source)
	at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
	at io.quarkus.runtime.Application.start(Application.java:101)
	at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:119)
	at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
	at org.keycloak.quarkus.runtime.KeycloakMain.start(KeycloakMain.java:147)
	at org.keycloak.quarkus.runtime.cli.command.AbstractStartCommand.run(AbstractStartCommand.java:51)
	at picocli.CommandLine.executeUserObject(CommandLine.java:2030)
	at picocli.CommandLine.access$1500(CommandLine.java:148)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
	at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
	at picocli.CommandLine.execute(CommandLine.java:2174)
	at org.keycloak.quarkus.runtime.cli.Picocli.parseAndRun(Picocli.java:138)
	at org.keycloak.quarkus.runtime.KeycloakMain.main(KeycloakMain.java:107)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:62)
	at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:33)

Environment

  • Quarkus version - 3.14.4
  • Micrometer version - 1.13.3
  • Micrometer registry - Prometheus
  • OS: Linux 6.10.6-100.fc39.x86_64 SMP PREEMPT_DYNAMIC GNU/Linux
  • Java version: openjdk 21.0.3 2024-04-16 LTS

To Reproduce
How to reproduce the bug:

  • Start Keycloak server with enabled Quarkus micrometer extension sequentially multiple times

f.e. this test case https://github.com/keycloak/keycloak/blob/main/quarkus/tests/integration/src/test/java/org/keycloak/it/cli/dist/StartCommandDistTest.java#L134

Expected behavior
No ConcurrentModificationException

Additional context
It seems the problem is lack of synchronization

mabartos added a commit to mabartos/micrometer that referenced this issue Sep 24, 2024
Fixes micrometer-metrics#5489

Signed-off-by: Martin Bartoš <mabartos@redhat.com>
mabartos added a commit to mabartos/micrometer that referenced this issue Sep 24, 2024
Fixes micrometer-metrics#5489

Signed-off-by: Martin Bartoš <mabartos@redhat.com>
@shakuzen shakuzen added bug A general bug and removed waiting-for-triage labels Sep 25, 2024
@shakuzen shakuzen added this to the 1.13.x milestone Sep 25, 2024
@shakuzen shakuzen added the module: micrometer-core An issue that is related to our core module label Sep 25, 2024
@shakuzen
Copy link
Member

Thanks for the report. I was able to reproduce the issue running the specified test in Keycloak. I'm working on making a minimal test to reproduce it in our test suite before we decide on the best solution. We need to do something on the Micrometer side to handle this regardless, but I'm curious how difficult would it be to change things in Quarkus/Keycloak to ensure all MeterFilters are configured before meters are registered in this case? I did not look into detail about which parts of the code are configuring MeterFilters and which are registering meters.

mabartos added a commit to mabartos/micrometer that referenced this issue Sep 25, 2024
Fixes micrometer-metrics#5489

Signed-off-by: Martin Bartoš <mabartos@redhat.com>
@mabartos
Copy link
Contributor Author

I'm working on making a minimal test to reproduce it in our test suite before we decide on the best solution.

@shakuzen Great!

We need to do something on the Micrometer side to handle this regardless, but I'm curious how difficult would it be to change things in Quarkus/Keycloak to ensure all MeterFilters are configured before meters are registered in this case?

I don't know so many details about how Quarkus exactly handles that, but @brunobat mentioned something with regards to Quarkus in this #4920 (comment).

jfyi - In Keycloak, we use one custom MeterFilter, but even when I disable it, the issue still occurs (so no question about static/runtime init of the specific meter filter).

cc @gsmet

@brunobat
Copy link

brunobat commented Sep 25, 2024

As mentioned in #4920 (comment), There will always exist MeterFilters that require runtime data and therefore will be configured after the Meters are instantiated.
It might be possible to change things to register all filters before the meters, however, that would require a rewrite of the quarkus-micrometer bootstrap, which is not small work, nor trivial.

mabartos added a commit to mabartos/micrometer that referenced this issue Sep 27, 2024
Fixes micrometer-metrics#5489

Signed-off-by: Martin Bartoš <mabartos@redhat.com>
mabartos added a commit to mabartos/micrometer that referenced this issue Sep 30, 2024
Fixes micrometer-metrics#5489

Co-authored-by: Tommy Ludwig <8924140+shakuzen@users.noreply.github.com>
Signed-off-by: Martin Bartoš <mabartos@redhat.com>
mabartos added a commit to mabartos/micrometer that referenced this issue Sep 30, 2024
Fixes micrometer-metrics#5489

Co-authored-by: Tommy Ludwig <8924140+shakuzen@users.noreply.github.com>
Signed-off-by: Martin Bartoš <mabartos@redhat.com>
shakuzen added a commit to mabartos/micrometer that referenced this issue Sep 30, 2024
Fixes micrometer-metrics#5489

Co-authored-by: Tommy Ludwig <8924140+shakuzen@users.noreply.github.com>
Signed-off-by: Martin Bartoš <mabartos@redhat.com>
@shakuzen shakuzen modified the milestones: 1.13.x, 1.13.5 Sep 30, 2024
@shakuzen
Copy link
Member

The fix is available in snapshots as 1.13.5-SNAPSHOT now. Would you be able to try with that and verify the CME isn't happening anymore in your tests?

@mabartos
Copy link
Contributor Author

@shakuzen Sure, I'll do it.

@mabartos
Copy link
Contributor Author

mabartos commented Sep 30, 2024

@shakuzen +1 I can confirm, that the fix resolved the issue on our side as I'm unable to reproduce it anymore.

The issue reproducible test usually failed after ~10 startups.

Test executed again >~100x with success.

Included JARs in our distribution (1.13.5-SNAPSHOT):

> ll | grep micrometer
-rw-r--r--. 1 mabartos mabartos  47K Sep 30 11:52 io.micrometer.micrometer-commons-1.13.5-SNAPSHOT.jar
-rw-r--r--. 1 mabartos mabartos 843K Sep 30 11:52 io.micrometer.micrometer-core-1.13.5-SNAPSHOT.jar
-rw-r--r--. 1 mabartos mabartos  71K Sep 30 11:52 io.micrometer.micrometer-observation-1.13.5-SNAPSHOT.jar
-rw-r--r--. 1 mabartos mabartos  44K Sep 30 11:53 io.micrometer.micrometer-registry-prometheus-simpleclient-1.13.5-SNAPSHOT.jar
-rw-r--r--. 1 mabartos mabartos 216K Sep 30 12:25 io.quarkus.quarkus-micrometer-3.15.1.jar
-rw-r--r--. 1 mabartos mabartos 3.0K Sep 30 12:25 io.quarkus.quarkus-micrometer-registry-prometheus-3.15.1.jar

@mabartos
Copy link
Contributor Author

mabartos commented Oct 1, 2024

@shakuzen Hi, when can we expect the 1.13.5 release? :))

@shakuzen
Copy link
Member

shakuzen commented Oct 1, 2024

It's scheduled for October 14. Dates for the upcoming releases are set on the GitHub milestones (as long as we don't forget to set them): https://github.com/micrometer-metrics/micrometer/milestone/264

@mabartos
Copy link
Contributor Author

mabartos commented Oct 1, 2024

@shakuzen Oh, that's so unfortunate. This issue is a blocker for us as our major release relies on it. We also need to include it in the Quarkus LTS.

Do you really need to stick to that date with this patch release? Can we prioritize, or escalate it somehow from our side?

The thing is, whether would be possible to release this patch release asap, and the planned workload for this release postpone to 1.13.6?

Thanks

@shakuzen
Copy link
Member

shakuzen commented Oct 2, 2024

Given the nature of this bug and the difficulty in working around it without significant work, we've released 1.13.5 earlier than our regular release schedule. It's available in Maven Central now. Thanks for all your help reporting the issue, providing a fix, and testing the snapshots.

@mabartos
Copy link
Contributor Author

mabartos commented Oct 2, 2024

@shakuzen Thank you very much! :))

@gsmet
Copy link

gsmet commented Oct 2, 2024

Thanks for the quick release, it's very much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A general bug module: micrometer-core An issue that is related to our core module
Projects
None yet
4 participants