A Helm chart to deploy Cruise Control for Apache Kafka
Name | Url | |
---|---|---|
ialejandro | hello@ialejandro.rocks | https://ialejandro.rocks |
- Helm 3+
helm repo add cruise-control https://devops-ia.github.io/helm-cruise-control
helm repo update
helm install [RELEASE_NAME] cruise-control/cruise-control
This install all the Kubernetes components associated with the chart and creates the release.
See helm install for command documentation.
Charts are also available in OCI format. The list of available charts can be found here.
helm install [RELEASE_NAME] oci://ghcr.io/devops-ia/helm-cruise-control/cruise-control --version=[version]
helm uninstall [RELEASE_NAME]
This removes all the Kubernetes components associated with the chart and deletes the release.
See helm uninstall for command documentation.
See basic installation and examples.
See Customizing the chart before installing. To see all configurable options with comments:
helm show values cruise-control/cruise-control
Key | Type | Default | Description |
---|---|---|---|
affinity | object | {} |
Affinity for pod assignment |
autoscaling | object | {"enabled":false,"maxReplicas":100,"minReplicas":1,"targetCPUUtilizationPercentage":80} |
Autoscaling with CPU or memory utilization percentage |
capacity | object | {"config":"{\n \"brokerCapacities\":[\n {\n \"brokerId\": \"-1\",\n \"capacity\": {\n \"DISK\": \"1024\",\n \"CPU\": \"100\",\n \"NW_IN\": \"1000\",\n \"NW_OUT\": \"1000\"\n },\n \"doc\": \"This is the default capacity. Capacity unit used for disk is in MB, cpu is in number of cores, network throughput is in KB.\"\n }\n ]\n}\n","type":"capacity"} |
Cruise Control cluster resources |
cluster | object | {"an.example.cluster.config":false,"min.insync.replicas":1} |
Cruise Control cluster config Sample: https://github.com/linkedin/cruise-control/blob/main/config/clusterConfigs.json |
config | object | `{"anomaly.detection.goals":["com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal"],"anomaly.detection.interval.ms":"10000","anomaly.notifier.class":"com.linkedin.kafka.cruisecontrol.detector.notifier.SelfHealingNotifier","bootstrap.servers":"localhost:9092","broker.metric.sample.store.topic":"__KafkaCruiseControlModelTrainingSamples","broker.metrics.window.ms":300000,"broker.sample.store.topic.partition.count":8,"client.id":"cruise-control","cluster.configs.file":"config/clusterConfigs.json","completed.cruise.control.admin.user.task.retention.time.ms":604800000,"completed.cruise.control.monitor.user.task.retention.time.ms":86400000,"completed.kafka.admin.user.task.retention.time.ms":604800000,"completed.kafka.monitor.user.task.retention.time.ms":86400000,"completed.user.task.retention.time.ms":86400000,"connections.max.idle.ms":540000,"cpu.balance.threshold":1.1,"cpu.capacity.threshold":0.7,"cpu.low.utilization.threshold":0,"default.goals":["com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal"],"default.replica.movement.strategies":["com.linkedin.kafka.cruisecontrol.executor.strategy.BaseReplicaMovementStrategy"],"demotion.history.retention.time.ms":1209600000,"disk.balance.threshold":1.1,"disk.capacity.threshold":0.8,"disk.low.utilization.threshold":0,"execution.progress.check.interval.ms":10000,"goals":["com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerDiskUsageDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerEvenRackAwareGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal"],"hard.goals":["com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal"],"intra.broker.goals":["com.linkedin.kafka.cruisecontrol.analyzer.goals.IntraBrokerDiskCapacityGoal","com.linkedin.kafka.cruisecontrol.analyzer.goals.IntraBrokerDiskUsageDistributionGoal"],"max.active.user.tasks":5,"max.cached.completed.cruise.control.admin.user.tasks":30,"max.cached.completed.cruise.control.monitor.user.tasks":20,"max.cached.completed.kafka.admin.user.tasks":30,"max.cached.completed.kafka.monitor.user.tasks":20,"max.cached.completed.user.tasks":25,"max.num.cluster.partition.movements":1250,"max.replicas.per.broker":10000,"metric.anomaly.analyzer.metrics":["BROKER_PRODUCE_LOCAL_TIME_MS_50TH","BROKER_PRODUCE_LOCAL_TIME_MS_999TH","BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_50TH","BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_999TH","BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_50TH","BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_999TH","BROKER_LOG_FLUSH_TIME_MS_50TH","BROKER_LOG_FLUSH_TIME_MS_999TH"],"metric.anomaly.detection.interval.ms":120000,"metric.anomaly.finder.class":"com.linkedin.kafka.cruisecontrol.detector.KafkaMetricAnomalyFinder","metric.anomaly.percentile.lower.threshold":10,"metric.anomaly.percentile.upper.threshold":90,"metric.sampler.class":"com.linkedin.kafka.cruisecontrol.monitor.sampling.prometheus.PrometheusMetricSampler","metric.sampler.partition.assignor.class":"com.linkedin.kafka.cruisecontrol.monitor.sampling.DefaultMetricSamplerPartitionAssignor","metric.sampling.interval.ms":120000,"min.samples.per.broker.metrics.window":1,"min.samples.per.partition.metrics.window":1,"min.valid.partition.ratio":0.95,"network.inbound.balance.threshold":1.1,"network.inbound.capacity.threshold":0.8,"network.inbound.low.utilization.threshold":0,"network.outbound.balance.threshold":1.1,"network.outbound.capacity.threshold":0.8,"network.outbound.low.utilization.threshold":0,"num.broker.metrics.windows":20,"num.concurrent.intra.broker.partition.movements":2,"num.concurrent.leader.movements":1000,"num.concurrent.partition.movements.per.broker":5,"num.partition.metrics.windows":5,"num.proposal.precompute.threads":1,"num.sample.loading.threads":8,"partition.metric.sample.store.topic":"__KafkaCruiseControlPartitionMetricSamples","partition.metrics.window.ms":300000,"partition.sample.store.topic.partition.count":8,"prometheus.server.endpoint":"thanos-query.prometheus:9090","proposal.expiration.ms":60000,"removal.history.retention.time.ms":1209600000,"replica.count.balance.threshold":1.1,"replica.movement.strategies":["com.linkedin.kafka.cruisecontrol.executor.strategy.PostponeUrpReplicaMovementStrategy","com.linkedin.kafka.cruisecontrol.executor.strategy.PrioritizeLargeReplicaMovementStrategy","com.linkedin.kafka.cruisecontrol.executor.strategy.PrioritizeSmallReplicaMovementStrategy","com.linkedin.kafka.cruisecontrol.executor.strategy.PrioritizeMinIsrWithOfflineReplicasStrategy","com.linkedin.kafka.cruisecontrol.executor.strategy.PrioritizeOneAboveMinIsrWithOfflineReplicasStrategy","com.linkedin.kafka.cruisecontrol.executor.strategy.BaseReplicaMovementStrategy"],"sample.store.class":"com.linkedin.kafka.cruisecontrol.monitor.sampling.KafkaSampleStore","sample.store.topic.replication.factor":2,"sampling.allow.cpu.capacity.estimation":true,"self.healing.disk.failure.enabled":false,"self.healing.enabled":false,"self.healing.exclude.recently.demoted.brokers":true,"self.healing.exclude.recently.removed.brokers":true,"self.healing.goal.violation.enabled":false,"self.healing.maintenance.event.enabled":false,"self.healing.metric.anomaly.enabled":false,"self.healing.topic.anomaly.enabled":false,"topic.anomaly.finder.class":"com.linkedin.kafka.cruisecontrol.detector.TopicReplicationFactorAnomalyFinder","topic.config.provider.class":"com.linkedin.kafka.cruisecontrol.config.KafkaAdminTopicConfigProvider","topics.excluded.from.partition.movement":"__consumer_offsets.* | __amazon_msk_canary.* |
env | object | {} |
Environment variables to configure application Ref: https://github.com/linkedin/cruise-control/blob/main/kafka-cruise-control-start.sh |
fullnameOverride | string | "" |
String to fully override cruise-control.fullname template |
image | object | {"pullPolicy":"IfNotPresent","repository":"ghcr.io/devops-ia/kafka-cruise-control","tag":""} |
Image registry |
imagePullSecrets | list | [] |
Global Docker registry secret names as an array |
ingress | object | {"annotations":{},"className":"","enabled":false,"hosts":[{"host":"chart-example.local","paths":[{"path":"/","pathType":"ImplementationSpecific"}]}],"tls":[]} |
Ingress configuration to expose app |
jaas | object | {"config":"// Enter appropriate Client entry for secured zookeeper client connections\nClient {\n com.sun.security.auth.module.Krb5LoginModule required\n useKeyTab=true\n keyTab=\"/path/to/zookeeper_client.keytab\"\n storeKey=true\n useTicketCache=false\n principal=\"zookeeper_client@<REALM>\";\n};\n\n// Enter appropriate KafkaClient entry if using the SASL protocol, remove if not\nKafkaClient {\n com.sun.security.auth.module.Krb5LoginModule required\n useKeyTab=true\n keyTab=\"/path/to/kafka_client.keytab\"\n storeKey=true\n useTicketCache=false\n serviceName=\"kafka\"\n principal=\"kafka_client@<REALM>\";\n};\n","enabled":false} |
Cruise Control JAAS configuration Sample: https://github.com/linkedin/cruise-control/blob/main/config/cruise_control_jaas.conf_template |
livenessProbe | object | {"enabled":false,"failureThreshold":3,"initialDelaySeconds":180,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":5} |
Configure liveness checker Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes |
livenessProbeCustom | object | {} |
Custom livenessProbe |
log4j | object | {"appender.console.layout.pattern":"[%d] %p %m (%c)%n","appender.console.layout.type":"PatternLayout","appender.console.name":"STDOUT","appender.console.type":"Console","appender.kafkaCruiseControlAppender.fileName":"${filename}/kafkacruisecontrol.log","appender.kafkaCruiseControlAppender.filePattern":"${filename}/kafkacruisecontrol.log.%d{yyyy-MM-dd-HH}","appender.kafkaCruiseControlAppender.layout.pattern":"[%d] %p %m (%c)%n","appender.kafkaCruiseControlAppender.layout.type":"PatternLayout","appender.kafkaCruiseControlAppender.name":"kafkaCruiseControlFile","appender.kafkaCruiseControlAppender.policies.time.interval":1,"appender.kafkaCruiseControlAppender.policies.time.type":"TimeBasedTriggeringPolicy","appender.kafkaCruiseControlAppender.policies.type":"Policies","appender.kafkaCruiseControlAppender.type":"RollingFile","appender.operationAppender.fileName":"${filename}/kafkacruisecontrol-operation.log","appender.operationAppender.filePattern":"${filename}/kafkacruisecontrol-operation.log.%d{yyyy-MM-dd}","appender.operationAppender.layout.pattern":"[%d] %p [%c] %m %n","appender.operationAppender.layout.type":"PatternLayout","appender.operationAppender.name":"operationFile","appender.operationAppender.policies.time.interval":1,"appender.operationAppender.policies.time.type":"TimeBasedTriggeringPolicy","appender.operationAppender.policies.type":"Policies","appender.operationAppender.type":"RollingFile","appender.requestAppender.fileName":"${filename}/kafkacruisecontrol-request.log","appender.requestAppender.filePattern":"${filename}/kafkacruisecontrol-request.log.%d{yyyy-MM-dd-HH}","appender.requestAppender.layout.pattern":"[%d] %p %m (%c)%n","appender.requestAppender.layout.type":"PatternLayout","appender.requestAppender.name":"requestFile","appender.requestAppender.policies.time.interval":1,"appender.requestAppender.policies.time.type":"TimeBasedTriggeringPolicy","appender.requestAppender.policies.type":"Policies","appender.requestAppender.type":"RollingFile","appenders":"console, kafkaCruiseControlAppender, operationAppender, requestAppender","logger.CruiseControlPublicAccessLogger.appenderRef.requestAppender.ref":"requestFile","logger.CruiseControlPublicAccessLogger.level":"info","logger.CruiseControlPublicAccessLogger.name":"CruiseControlPublicAccessLogger","logger.cruisecontrol.appenderRef.kafkaCruiseControlAppender.ref":"kafkaCruiseControlFile","logger.cruisecontrol.level":"info","logger.cruisecontrol.name":"com.linkedin.kafka.cruisecontrol","logger.detector.appenderRef.kafkaCruiseControlAppender.ref":"kafkaCruiseControlFile","logger.detector.level":"info","logger.detector.name":"com.linkedin.kafka.cruisecontrol.detector","logger.operationLogger.appenderRef.operationAppender.ref":"operationFile","logger.operationLogger.level":"info","logger.operationLogger.name":"operationLogger","property.filename":"./logs","rootLogger.appenderRef.console.ref":"STDOUT","rootLogger.appenderRef.kafkaCruiseControlAppender.ref":"kafkaCruiseControlFile","rootLogger.appenderRefs":"console, kafkaCruiseControlAppender","rootLogger.level":"INFO"} |
Cruise Control log4j configuration |
nameOverride | string | "" |
String to partially override cruise-control.fullname template (will maintain the release name) |
nodeSelector | object | {} |
Node labels for pod assignment |
podAnnotations | object | {} |
Pod annotations |
podLabels | object | {} |
Pod labels |
podSecurityContext | object | {} |
Privilege and access control settings for a Pod or Container |
readinessProbe | object | {"enabled":false,"failureThreshold":3,"initialDelaySeconds":10,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":1} |
Configure readinessProbe checker Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes |
readinessProbeCustom | object | {} |
Custom readinessProbe |
replicaCount | int | 1 |
Number of replicas |
resources | object | {} |
The resources limits and requested |
securityContext | object | {} |
Privilege and access control settings |
service | object | {"port":80,"targetPort":9090,"type":"ClusterIP"} |
Kubernetes service to expose Pod |
service.port | int | 80 |
Kubernetes Service port |
service.targetPort | int | 9090 |
Pod expose port |
service.type | string | "ClusterIP" |
Kubernetes Service type. Allowed values: NodePort, LoadBalancer or ClusterIP |
serviceAccount | object | {"annotations":{},"automountServiceAccountToken":false,"create":true,"name":""} |
Enable creation of ServiceAccount |
startupProbe | object | {"enabled":false,"failureThreshold":30,"initialDelaySeconds":180,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":5} |
Configure startupProbe checker Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes |
startupProbeCustom | object | {} |
Custom startupProbe |
tolerations | list | [] |
Tolerations for pod assignment |
volumeMounts | list | [] |
Additional volumeMounts on the output Deployment definition. |
volumes | list | [] |
Additional volumes on the output Deployment definition. |