Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassandra 4.1 process does not start with ZGC enabled #1368

Closed
iAlex97 opened this issue Jul 12, 2024 · 2 comments
Closed

Cassandra 4.1 process does not start with ZGC enabled #1368

iAlex97 opened this issue Jul 12, 2024 · 2 comments
Assignees
Labels
bug Something isn't working in-progress Issues in the state 'in-progress'

Comments

@iAlex97
Copy link

iAlex97 commented Jul 12, 2024

What happened?

I just got started with using K8ssandra operator and cannot wait to migrate to it our on-premise cluster. Having previously ran that cluster (version 3.11) with Shenandoah GC and saw the latency improvements, enabling ZGC was among the first things I tried. However after checking out 4.0-jdk11-G1 Cassandra pods never fully initialised, due to Cassandra process immediately exiting when started.

Did you expect to see something different?

I would expect the cluster to come up normally using the test fixture.

How to reproduce it (as minimally and precisely as possible):

  1. Install k8ssandra operator using Helm
  2. kubectl apply -f manifest.yaml
  3. Readiness probe will always return 500

Environment

  • K8ssandra Operator version:

    1.17.0

    * Kubernetes version information: `Server Version: version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.5", GitCommit:"59755ff595fa4526236b0cc03aa2242d941a5171", GitTreeState:"clean", BuildDate:"2024-05-14T10:39:39Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}` * Kubernetes cluster kind:```

Kubespray on baremetal


* Manifests:

```yaml
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: prod
  namespace: k8ssandra-operator
spec:
  cassandra:
    serverVersion: "4.1.5"

    datacenters:
      - metadata:
          name: fsn1

        size: 3

        resources:
          requests:
            cpu: 24
            memory: 64Gi
            hugepages-2Mi: 5Gi
          limits:
            hugepages-2Mi: 5Gi

        storageConfig:
          cassandraDataVolumeClaimSpec:
            storageClassName: topolvm-cassandra
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 300Gi

        config:
          jvmOptions:
            heap_initial_size: 4G
            heap_max_size: 4G
            gc: ZGC
            additionalOptions: {}
              # - -XX:ConcGCThreads=1
              # - -XX:ParallelGCThreads=2 # must be >= ConcGCThreads

        networking:
          hostNetwork: false
  • K8ssandra Operator Logs:

not relevant

Anything else we need to know?:

My debugging process involved running exec on one pod and trying to manually start the cassandra process like this:

export JAVA_VERSION=11
source /opt/cassandra/conf/cassandra-env.sh
/opt/cassandra/bin/cassandra

results in the following output

Error: VM option 'UseZGC' is experimental and must be enabled via -XX:+UnlockExperimentalVMOptions.
Error: The unlock option must precede 'UseZGC'.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Checking the contents of /opt/cassandra/conf/jvm11-server.options:

-Djdk.attach.allowAttachSelf=true
--add-exports java.base/jdk.internal.misc=ALL-UNNAMED
--add-exports java.base/jdk.internal.ref=ALL-UNNAMED
--add-exports java.base/sun.nio.ch=ALL-UNNAMED
--add-exports java.management.rmi/com.sun.jmx.remote.internal.rmi=ALL-UNNAMED
--add-exports java.rmi/sun.rmi.registry=ALL-UNNAMED
--add-exports java.rmi/sun.rmi.server=ALL-UNNAMED
--add-exports java.sql/java.sql=ALL-UNNAMED
--add-opens java.base/java.lang.module=ALL-UNNAMED
--add-opens java.base/jdk.internal.loader=ALL-UNNAMED
--add-opens java.base/jdk.internal.ref=ALL-UNNAMED
--add-opens java.base/jdk.internal.reflect=ALL-UNNAMED
--add-opens java.base/jdk.internal.math=ALL-UNNAMED
--add-opens java.base/jdk.internal.module=ALL-UNNAMED
--add-opens java.base/jdk.internal.util.jar=ALL-UNNAMED
--add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED
-Dio.netty.tryReflectionSetAccessible=true
-XX:+UseZGC
-XX:+UnlockExperimentalVMOptions

which indeed shows the -XX:+UseZGC flag before -XX:+UnlockExperimentalVMOptions.

My workaround was setting -XX:+UnlockExperimentalVMOptions in JVM_OPTIONS like this:

export JVM_OPTS="$JVM_OPTS -XX:+UnlockExperimentalVMOptions"
/opt/cassandra/bin/cassandra
# cassandra starts normally

Finally I would also like to mention that the use of ZGC should be backed by enabling hugepages on the nodes which was my first guess as to why the java process refused to start.

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: K8OP-10

@iAlex97 iAlex97 added the bug Something isn't working label Jul 12, 2024
@iAlex97
Copy link
Author

iAlex97 commented Jul 13, 2024

Finally got it to work using Custom GC like this:

        config:
          jvmOptions:
            heap_initial_size: 4G
            heap_max_size: 4G
            gc: Custom
            additionalOptions:
              - -XX:+UnlockExperimentalVMOptions
              - -XX:+UseLargePages
              - -XX:+UseZGC

@burmanm burmanm self-assigned this Jul 15, 2024
@adejanovski adejanovski added assess Issues in the state 'assess' in-progress Issues in the state 'in-progress' and removed assess Issues in the state 'assess' labels Jul 15, 2024
@burmanm
Copy link
Contributor

burmanm commented Sep 10, 2024

This was fixed and works in 1.19.0.

@burmanm burmanm closed this as completed Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working in-progress Issues in the state 'in-progress'
Projects
No open projects
Status: In Progress
Development

No branches or pull requests

3 participants