Replace the existing bidirectional topic operator with a unidirectional operator.
The Strimzi Topic Operator provides a Kubernetes API for viewing and modifying topics within a Kafka cluster.
It deviates from the standard operator pattern by being bidirectional.
That is, it will reconcile changes to a KafkaTopic
resource both to and from a Kafka cluster.
The bidirectionality means applications and users can continue to use Kafka-native APIs (such as the Admin
client) and tooling (e.g. the scripts provided by Apache Kafka) as required.
The KafkaTopic's spec
will be updated by the operator to reflect those changes.
The Strimzi community often abbreviates the topic operator as TO, but to avoid ambiguity in this document we will refer to the existing bidirectional topic operator as the BTO, and the proposed unidirectional topic operator as the UTO.
The BTO makes use of ZooKeeper znode watches to know about changes to topic state within the Kafka cluster.
But Apache Kafka is removing its ZooKeeper-dependence which means that the BTO won't work with Kafka clusters that use its replacement, known as KRaft.
In order to continue to provide a kube-native API for managing Kafka topics we've considered some alternatives which would work with a KRaft-based Kafka cluster:
- Using polling via the Kafka
Admin
client. - Using a KRaft observer
The polling approach, while simple, has significant drawbacks. It would scale poorly with the number of topics (in terms of CPU and memory), and would also suffer from increased latency between the time a change was made in Kafka and when it was reflected in Kubernetes. The increased latency would widen the window for making conflicting changes.
The observer approach could work, but comes with some serious drawbacks:
- The Kafka project does not provide a publicly-supported API for running a KRaft observer, so we'd need to make use of internal APIs which could change between releases without warning. This represents a potential maintenance burden to the project.
- The operator would require persistent storage (for the replicated
__cluster_metadata
log), which would make the operator significantly more difficult to operate. For example, if it were realized as a single podDeployment
, then the requirement for a persistent volume would tie it to a single AZ, meaning it would not be as highly available as the current operator. This could be worked around by having multiple observers with only a single elected leader actually in charge of making modifications at any one time.
Overall, it is clear that the complexity of the operator would be significantly increased by pursuing this direction.
It's also worth reviewing why people want to use an operator for topics. There are two broad use cases:
- Declarative deployments, for example in the form of gitops.
- Using
KafkaTopics
operationally as the API through which to manage all topics.
Based on opened issues and Slack conversations with users, it seems that 1. is the most common reason people use the BTO. However, the BTO doesn't do a great job at satisfying this use case.
- It only supports a single namespace, which means BTO admins cannot easily follow a workspace-per-dev team model.
It also means that infrastructure-level topics (e.g.
__consumer_offsets
and__transaction_state
) comingle with per-application topics. - It doesn't enforce a single source of truth, allowing for changes done directly in Kafka to conflict with declarative deployments.
Moreover, the BTO doesn't do a good job with item 2. either:
- It only surfaces a subset of topic state through the Kube API (it doesn't include partition leadership/ISR for example) This is intentional (it doesn't make sense to overload etcd to present this kind of dynamic information), but it undermines the suitability of the BTO for this use case.
For these reasons we are proposing the UTO.
- A unidirectional topic operator which is KRaft compatible
- Supporting replication factor change (this could be added later)
We would remain schema-compatible with the existing KafkaTopic
resource at the API level.
This means that all the existing spec
sections could be handled by the new operator and all existing fields of the status
section would still be present for applications that consume the status.
However the spec
would no longer be updated when Kafka-side configuration changed.
Instead, if Kafka-side configuration was changed out-of-band (e.g. by using the Admin
client, or Kafka shell scripts) those changes would eventually (depending on the timed reconciliation interval) be reverted.
As context for the rest of this document, an example KafkaTopic
currently looks like this:
kind: KafkaTopic
metadata:
generation: 123
name: example.topic
spec:
topicName: example_topic
replicas: 3
partitions: 50
config:
retention.ms: 123456789
status:
topicName: example_topic
observedGeneration: 123
conditions:
- type: Ready
status: True
lastTransitionTime: 20230301T103000Z
Where:
spec.topicName
is optional and needs only be used for topic names which are not legal kube resource names. When this field is absent the name of the topic is Kafka is taken from themetadata.name
. In the example above of aKafkaTopic
managing theexample_topic
topic, themetadata.name
cannot beexample_topic
because that's not allowed as resource name in Kube, so themetadata.name
must be something else andspec.topicName
is required.status.topicName
is used to detect a change to thespec.topicName
, which results in the topic having aReady
condition withstatus = False
(i.e. is an error).metadata.generation == status.observedGeneration
implies that the operator has reconciled the latest change to thespec
. Whenmetadata.generation > status.observedGeneration
it means the resource has been changed, but the operator has not yet reacted to the change. Whether that reconciliation was successful is indicated by theReady
condition having statusTrue
.
Some new fields will be introduced in the following text.
To avoid ambiguity we will continue to require that a single KafkaTopic
is used to manage a single topic in a single Kafka cluster.
The fact that there are legal topic names (in Kafka) that are not legal resource names (in Kube) provided the motivation for supporting spec.topicName
, so that a topic in Kafka could be managed by a KafkaTopic
with a different metadata.name
.
As such it's not possible for the Kube apiserver to enforce the single resource principal because multiple KafkaTopic
resources might refer to the same topic (i.e. the uniqueness constraint on metadata.name
is insufficient).
It is therefore left to the operator to detect if there are multiple KafkaTopics
which point to the same topic in Kafka.
To do this, the operator will keep an in-memory mapping of topic name to (metadata.namespace
, metadata.name
)-pairs (where topic name is spec.topicName
, defaulting to metadata.name
as described above).
This will allow us to detect the case where multiple resources are attempting to manage the same topic in Kafka.
When this happens the unique oldest (as determined by the metadata.creationTimestamp
) KafkaTopic
will be considered to manage the topic, and the other KafkaTopics
will be updated: their Ready
status will change to False
with suitable reason
.
kind: KafkaTopic
metadata:
generation: 123
namespace: some-namespace
name: foo
creationTimestamp: 20230301T103000Z
status:
topicName: __consumer_offsets
observedGeneration: 123
conditions:
- type: Ready
status: True
lastTransitionTime: 20230301T103000Z
---
kind: KafkaTopic
metadata:
generation: 456
namespace: some-namespace
name: bar
creationTimestamp: 20230420T103000Z
status:
topicName: __consumer_offsets
observedGeneration: 456
conditions:
- type: Ready
status: False
reason: ResourceConflict
message: Managed by some-namespace/foo
lastTransitionTime: 20230420T103000Z
As mentioned, we would only synchronize topic state from Kube to Kafka.
Users and applications could create, delete and modify topics using the Kafka protocol (e.g. via the Admin
client) which would not be reflected by a KafkaTopic
in Kube.
In other words, the existence of at least some KafkaTopic
resources would be decoupled from the existence of a corresponding topic in Kafka.
While users could choose to create a topic in Kafka by creating a KafkaTopic
, they could also create, modify and eventually delete topics directly without a KafkaTopic
ever existing.
When a KafkaTopic
did exist for a given topic in Kafka then any changes made via other tooling would eventually be reverted.
We will refer to a topic which exists in a Kafka cluster and has a corresponding KafkaTopic
as a managed topic.
A topic which exists in a Kafka cluster without a matching KafkaTopic
is an unmanaged topic.
Naturally any topic created due to the creation of a KafkaTopic
is managed from the outset.
Because topics can be created directly in Kafka we need a mechanism for converting an unmanaged topic to a managed topic, aka "managing" the topic.
This can easily be done by creating a matching KafkaTopic
: The operator will attempt the CreateTopics
request using the Kafka Admin client and if it receives a TOPIC_ALREADY_EXISTS
error will proceed with reconciliation as it would for a KafkaTopic
modification.
That is, the configuration of the topic in Kafka will be changed to match the KafkaTopic
(or fail for the same possible reasons and with the same error conditions in the status
as can happen for any update).
The user creates a
KafkaTopic
:metadata: generation: 1 name: foo spec: partitions: 12 replicas: 50
The operator checks its in-memory map to see whether
foo
is already managed by some other resource.
- If so, then the
metadata.creationTimestamps
are compared and if this resource has a more recent timestamp then it is updated with an error condition in itsstatus
.kind: KafkaTopic metadata: generation: 1 name: foo status: topicName: foo observedGeneration: 1 conditions: - type: Ready status: False reason: ResourceConflict message: Managed by multiple KafkaTopic resources: some-namespace/bar lastTransitionTime: 20230301T103000ZThe reconciliation ends
Otherwise the operator attempts to it create the topic in Kafka.
If the
CreateTopics
request fails due toTOPIC_ALREADY_EXISTS
then the operator updates the topic in Kafka to reflect thespec
usingincrementalAlterConfigs()
, and/orcreatePartitions()
5.The
status
is updated to reflect the Admin client requests. In the path case it would look like this:metadata: generation: 1 name: foo spec: topicName: foo partitions: 12 replicas: 50 status: observedGeneration: 1 topicName: foo conditions: - type: Ready status: True lastTransitionTime: 20230301T103000Z
Via a Kube watch, and also based on a timer, the operator will
ensure that the state of a managed topic in a Kafka cluster reflects the spec
of its KafkaTopic
.
The process will first describe the topic and its configs and then make alterations, if required.
- Modification to topic config will be via
Admin.incrementalAlterConfigs()
using theSET
operation. - Creation of partitions will be supported
- Deletion of partitions is not supported by Kafka and will result in a suitable
status.condition
in theKafkaTopic
.The user will then need to decide how to proceed, but typically they might just revert thestatus: conditions: - type: Ready status: False reason: NotSupported message: Decrease of spec.partitions is not supported by Kafka lastTransitionTime: 20230301T103000Z
spec.partitions
. - Changes to
spec.replicas
could be supported via Cruise Control, but that is not part of this proposal.status: conditions: - type: Ready status: False reason: NotSupported message: Changing spec.replicas is not supported by the operator lastTransitionTime: 20230301T103000Z
Note that both these
NotSupported
reasons can arise both from changes being made to theKafkaTopic
, or changes made directly in Kafka. For example, the UTO cannot tell whether thespec.partitions
was decreased, or whether it was unchanged and the conflict arises because someone increased the number of partitions directly in Kafka so that it looks like theKafkaTopic
is requesting a decrease. In any case, when the user takes action to fix the problem theKafkaTopic
will (eventually) get reconciled and thestatus
will revert toReady
.
By default we will use a Kube finalizer for deletion.
This means that deletion via the kube REST API will first mark the resource as scheduled for deletion by setting the metadata.delectionTimestamp
, allowing the operator to handle the deletion and then update the KafkaTopic
, removing its metadata.finalizer
so that the resource is actually removed from etcd
.
Without a finalizer then the following would be possible:
- A topic in Kafka is managed by a
KafkaTopic
- The operator stops (for any reason)
- The
KafkaTopic
is deleted - The operator starts
The semantics of a managed topic being deleted should be that the topic in Kafka gets deleted, but that fails to happen in this case.
The operator cannot use the existence of a topic in Kafka to infer that the KafkaTopic
has been deleted because topics in Kafka can be unmanaged.
A small extra benefit of using a finalizer is that any errors during deletion can be reported via the KafkaTopic
's status
.
status:
conditions:
- type: Ready
status: False
reason: KafkaError
message: Deletion failed: ${Error_Message}
lastTransitionTime: 20230301T103000Z
While the behaviour described above will be the default it will be possible to opt out of this using the UTO's new STRIMZI_ADD_FINALIZER
config parameter.
STRIMZI_USE_FINALIZER=true
: The presence of thestrimzi.io/topic-operator
finalizer will be checked (and added, if missing) on every reconciliation where themetadata.deletionTimestamp
isn't set, including unmanaged topics. The operator will not add or remove the finalizer onKafkaTopics
which do not match its label selector.STRIMZI_USE_FINALIZER=false
: The operator will remove thestrimzi.io/topic-operator
finalizer if present.
The topic is currently Kube-managed
metadata: generation: 123 name: foo finalizer: - strimzi.io/topic-operator spec: # ... status: observedGeneration: 123 # == generation => it is Kube-managed
The user then deletes the resource (e.g.
kubectl delete kafkatopic foo
). Because of the presence of ametadata.finalizer
the resource remains present. Instead Kube adds themetadata.deletionTimestamp
field.metadata: generation: 124 name: foo finalizer: - strimzi.io/topic-operator deletionTimestamp: 20230301T000000.000 spec: # ... status: observedGeneration: 124 # == generation => it is not syncrhonized
The operator notices the
metadata.deletionTimestamp
field.
- If there is more than one
KafkaTopic
which is trying to manage this topic then deletion from Kafka is not attempted; the operator will remove the finalizer and Kube will remove the resource; upon timed reconciliation the condition on the otherKafkaTopic
will be removed and reconciliation via that otherKafkaTopic
will proceed as normal.- If deletion is disallowed via
delete.topic.enable=false
broker configuration then the operator will remove the finalizer and Kube will remove the resource. The topic in Kafka will thus become unmanaged.- Otherwise, the operator attempts to delete the topic from Kafka.
- If deletion succeeds the operator will remove
strimzi.io/topic-operator
from themetadata.finalizer
and Kube will remove the resource.- If deletion failed with
UNKNOWN_TOPIC_OR_PARTITION
(i.e. the topic didn't exist) the operator will continue to remove the finalizer. I.e. this case is not treated as an error.- If deletion failed with any other error this is reported via a condition in the
status.conditions
and the finalizer is not removed.
Similarly to how it is possible to "manage" an existing unmanaged topic in Kafka, we will also provide a mechanism for "unmanaging" a managed topic, via a strimzi.io/managed=false
annotation.
This will allow a KafkaTopic
to be deleted without the topic in Kafka being deleted.
This can be useful operationally, for example to change the metadata.name
of a KafkaTopic
.
An alternative to
strimzi.io/managed=false
would be "explicit deletion" via aspec.delete
flag. This alternative was rejected because it's incompatible with gitops.
Assume the topic is currently Kube-managed
metadata: generation: 123 name: foo finalizer: - strimzi.io/topic-operator spec: # ... status: observedGeneration: 123 # == generation => it is Kube-managed
The user changes the
strimzi.io/managed=false
annotation tofalse
so that the topic is no longer Kube-managed.metadata: generation: 124 name: foo finalizer: - strimzi.io/topic-operator annotations: - strimzi.io/managed: false # do not synchronize to Kafka spec: # ... status: observedGeneration: 123 # != generation => it is Kube-managed
The user can then delete the resource. Because of the presence of a
metadata.finalizer
the resource is not actually deleted. Instead Kube adds themetadata.deletionTimestamp
field.metadata: generation: 124 name: foo finalizer: - strimzi.io/topic-operator annotations: - strimzi.io/managed: false # do not synchronize to Kafka deletionTimestamp: 20230301T000000.000
spec: # ... status: observedGeneration: 124 # == generation => it is not syncrhonized ```
- The operator notices the
metadata.deletionTimestamp
field. Since thespec.magnaged
isfalse
it does not attempt topic deletion.
- It removes
strimzi.io/topic-operator
from themetadata.finalizer
.- Kube proceeds to remove the resource.
Note that strimzi.io/managed: false
also means that a KafkaTopic
can exist without there being a corresponding topic in Kafka.
This WILL be supported by:
- unmanaging the topic (
strimzi.io/managed: false
). - deleting the
KafkaTopic
. - recreating the
KafkaTopic
with a newmetadata.name
, but withspec.topicName
referring to the same topic as the originalKafkaTopic
.
This will NOT be supported.
It will be detected by a mismatch between spec.topicName
and status.topicName
.
Changes thus detected will result in an error condition.
This will NOT be supported, because it's not supported by Kafka.
The user will either need to revert the spec.partitions
, or recreate the topic in Kafka (which they can do via the KafkaTopic
or not).
This will NOT be supported as part of this proposal.
Consider an application deployed via a manifest containing a Deployment
and some KafkaTopics
.
If the Kafka cluster is configured with auto.create.topics.enable
there is a race condition between:
- the operator reacting to the new
KafkaTopics
. - the application starting and triggering autocreation of the topics.
If the operator wins: then auto-creation will not happen, and the user's intent is honoured.
If the application wins: then the topics will be created using the default configuration for the cluster. The operator will later reconcile the KafkaTopics
which will result in reconfiguration if the default configuration for auto-created topics differs from the KafkaTopic.spec
. That reconfiguration will either succeed, or fail (e.g. because a KafkaTopic
has a lower spec.partitions
than the auto-created topic).
To avoid this it is recommended that the UTO not be used with a cluster where auto.create.topics.enable
is true
.
The UTO will emit a WARN
level log on startup if autocreation is enabled.
Alternatively: App could be written to wait for topic existency, or use an init container to do the same.
This proposal has described a number of different configurations of a KafkaTopic and how the operator will handle them. The following diagram and subsections summarise the operator's behaviour on transitions between these states.
Although not shown to avoid making the diagram overly complicated, the states within the dashed box are pairwise bidirectionally connected.
- The operator's label selector (
STRIMZI_RESOURCE_LABELS
) doesn't match theKafkaTopic
'smetadata.labels
. - The operator completely ignores the resource (it might be intended for a different instance of the UTO)
- On transition from start: The
metadata.finalizer
is not added. - On transition from any state in the dashed box, the
metadata.finalizer
is not changed (if it were then changing themetadata.labels
to unmatch operator instance A and match operator instance B would result in a race on A removing the finalizer and B adding it). - On deletion any finalizer is not removed, therefore this edge case might result in a zombie
KafkaTopic
.
- This is
strimzi.io/managed: false
. - The operator doesn't propagate changes to Kafka.
- The presence of the finalizer will be checked-for on each reconciliation of the resource.
- Deletion does not result in deletion of any topics in Kafka (which is why it's not connected to the Deletable 1 state).
- On transition to Deletable 2: The operator will remove the finalizer.
- On transition to any of the other states in the dashed box: The state of the topic in Kafka will be changed to match the
spec
.
- This is the normal case where changes are propagated to Kafka.
- On transition to Deletable 1: The topic is deleted from Kafka, and then the finalizer is removed.
- Transitions to Unmanaged if
strimzi.io/managed
is changed tofalse
. - Transitions to "Managed Conflicting First" if another
KafkaTopic
is created for the same topic in Kafka. - Transitions to "Managed Conflicting Rest" is a race case, rare in practice, where a second
KafkaTopic
is created for the same topic in Kafka with the samemetadata.creationTimestamp
.
- This is the case where there are multiple
KafkaTopics
for the same topic in Kafka, and this one has the unique oldestmetadata.creationTimestamp
. - It behaves the same as the "Managed not conflicting" case except with respect to deletion.
- Since other
KafkaTopics
exist for the same topic deletion of thisKafkaTopic
does not result in deletion from Kafka.
- This is the case where there are multiple
KafkaTopics
for the same topic in Kafka, and this one does not have the unique oldestmetadata.creationTimestamp
. - There may, or may not, be another
KafkaTopic
for the same topic in Kafka in state "Managed conflicting first". - Changes in this state, including deletions, are not propagated to Kafka.
- The existing bidirectional Topic Operator would be deprecated and removed once ZooKeeper-based clusters are no longer supported.
- A new unidirectional Topic Operator would be provided as a replacement, which users could migrate to before ZooKeeper-based clusters are no longer supported.
- The schema of
KafkaTopic
would change - The schema of
Kafka
would also change (see below). - The CRD changes would be reflected in the
api
module.
Let's consider each of the public APIs of the BTO in turn and describe the compatiblity story.
Env var | Status |
---|---|
STRIMZI_NAMESPACE |
Unchanged |
STRIMZI_RESOURCE_LABELS |
Unchanged |
STRIMZI_KAFKA_BOOTSTRAP_SERVERS |
Unchanged |
STRIMZI_CLIENT_ID |
Unchanged |
STRIMZI_FULL_RECONCILIATION_INTERVAL_MS |
Unchanged |
STRIMZI_TLS_ENABLED |
Unchanged |
STRIMZI_TRUSTSTORE_LOCATION |
Unchanged |
STRIMZI_TRUSTSTORE_PASSWORD |
Unchanged |
STRIMZI_KEYSTORE_LOCATION |
Unchanged |
STRIMZI_KEYSTORE_PASSWORD |
Unchanged |
STRIMZI_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM |
Unchanged |
STRIMZI_SASL_ENABLED |
Unchanged |
STRIMZI_SASL_MECHANISM |
Unchanged |
STRIMZI_SASL_USERNAME |
Unchanged |
STRIMZI_SASL_PASSWORD |
Unchanged |
STRIMZI_SECURITY_PROTOCOL |
Unchanged |
STRIMZI_ZOOKEEPER_CONNECT |
Dropped |
STRIMZI_ZOOKEEPER_SESSION_TIMEOUT_S |
Dropped |
TC_ZK_CONNECTION_TIMEOUT_MS |
Dropped |
STRIMZI_REASSIGN_THROTTLE |
Dropped, see note 1 |
STRIMZI_REASSIGN_VERIFY_INTERVAL_MS |
Dropped, see note 1 |
STRIMZI_TOPIC_METADATA_MAX_ATTEMPTS |
Dropped |
STRIMZI_TOPICS_PATH |
Dropped |
STRIMZI_STORE_TOPIC |
Dropped |
STRIMZI_STORE_NAME |
Dropped |
STRIMZI_APPLICATION_ID |
Dropped |
STRIMZI_STALE_RESULT_TIMEOUT_MS |
Dropped |
STRIMZI_USE_ZOOKEEPER_TOPIC_STORE |
Dropped |
STRIMZI_USE_FINALIZER |
Added |
Notes:
- Although these are present in the BTO source code they have never actually been used.
It is also anticiapted that during development we may want to add support for additional environment variables to configure non-functional aspects of the UTO (e.g. performance-related options like numbers of threads used for processing, etc).
The Topic Operator is usually deployed using the Kafka
CR (i.e. the Cluster Operator deploys the Topic Operator).
Here is how the schema for the Kafka
CR will be changed by this proposal:
kind: Kafka
metadata:
name: my-cluster
spec:
# ...
topicOperator:
# Unused options will be deprecated and ignored
topicMetadataMaxAttempts:
zookeeperSessionTimeoutSeconds:
# Options with same schema, but which may need to be reconfigured
livenessProbe: # probe values may need to be changed
readinessProbe: # probe values may need to be changed
startupProbe: # probe values may need to be changed
resources: # Resource requirements may differ
jvmOptions: # JVM options may need to be changed
logging: # Logger names will need to be changed
# Unchanged options
image:
watchedNamespace:
reconciliationIntervalSeconds:
The CR API is structurally unchanged.
Furthermore within the same structure:
status.conditions
may have new values for thereason
.
The semantics of the KafkaTopic
custom resource API will also change:
- Unidirectional reconciliation
- The absence of fields will not imply that they take their default values, only that the value is not specified in this CR.
The reason for 2 is to leave the door open to a further refinement of the UTO in the future. Consider the case where a Kafka cluster is centrally managed by an infrastructure team. The infra team want to delegate control over only a subset of the possible topic configs. Configs such as
cleanup.policy
andcompression.type
can be delegated to the application teams. But the infra team want to retain control over configs such asfollower.replication.throttled.replicas
,min.cleanable.dirty.ratio
orunclean.leader.election.enable
. So long as these subsets are disjoint there is no possibility of conflict in allowing two KafkaTopics in different namespaces to manage the different aspects of a single topic in Kafka. The disjointness could be enforced by a more sophistocated namespace policy which enumerated the configs. The same logic also applies tospec.partitions
andspec.replicas
. There would be the possibility of conflict around resource existence. That is, how should the operator behave if the application team delete their KafkaTopic while a KafkaTopic managing the same topic remains in the infra team's namespace? This proposal doesn't attempt to answer that, since it's not proposed to implement it currently. But in order to allow this possibility without introducing an incompatible change in the future we need to define the semantics of fields that are unspecified in thespec
to mean "unspecified by this CR" rather than "takes the default value".
The BTO exports the following metrics
Metric type | Metric name | Status |
---|---|---|
counter | strimzi.reconciliations.periodical | Unchanged |
counter | strimzi.reconciliations | Unchanged |
counter | strimzi.reconciliations.failed | Unchanged |
counter | strimzi.reconciliations.successful | Unchanged |
gauge | strimzi.resources | Unchanged |
timer | strimzi.reconciliations.duration | Unchanged |
gauge | strimzi.reconciliations.paused | Unchanged |
counter | strimzi.reconciliations.locked | Unchanged |
gauge | strimzi.resource.state | Unchanged |
Let's consider all the cases in which the BTO emits Event
resources to the Kube apiserver:
- For errors when accessing the topic store. For UTO there is no topic store, so this cannot happen.
- For errors when creating partitions.
For UTO ApiExceptions from the Admin client will be reported via
Ready
condition. For other errors application logging and metrics are considered sufficient. - For errors when changing the configuration of topics in Kafka.
For UTO ApiExceptions from the Admin client will be reported via
Ready
condition. For other errors application logging and metrics are considered sufficient. - For errors when processing a ZooKeeper znode watch. UTO doesn't use znode watches, so this cannot happen.
- On attempts to change a
KafkaTopic
'sspec.topicName
. For UTO this will be reported via the status only. - On detection of conflicting changes between Kube and Kafka. For UTO such conflicts cannot happen.
- On attempts to decrease the number of partitions. For UTO this will be reported via the status only.
Thus, the UTO will emit no events. Any software which consumed BTO events will not be compatible with UTO.
Existing users of the standalone BTO would have to:
- Review how they're using the BTO and whether they require bidirectionality
- If bidirectional support is required by their usage then the UTO cannot be used. The user will have no way of managing topics using
KafkaTopic
resources once ZooKeeper support is removed by Apache Kafka and/or Strimzi.
- If bidirectional support is required by their usage then the UTO cannot be used. The user will have no way of managing topics using
- Undeploy the BTO (they can retain their existing
KafkaTopics
) - Deploy the UTO (reconfiguring their
Deployment
). - Some reconfiguration of pod resources and JVM options may also be required.
A new feature gate will be introduced to allow users to adopt the UTO at their own pace.
The new Feature gate will be named UnidirectionalTopicOperator
.
The following table shows the expected graduation of the UnidirectionalTopicOperator
feature gate:
Phase | Strimzi versions | Default state |
---|---|---|
Alpha | 0.35 - 0.37 | Disabled by default |
Beta | 0.38 - 0.39 | Enabled by default |
GA | 0.40 and newer | Enabled by default (without possibility to disable it) |
Using finalizers on KafkaTopics
means that the operator has to be running (to remove the finalizer) before the resources will actually be deleted.
That can prevent other resources from being removed (for example the Namespace
containing the KafkaTopics
).
In these cases it is possible to remove the finalizer manually (e.g. via kubectl edit
), which will then allow the apiserver to remove any other resources.
This approach is not easily compatible with other topic management tooling.
This includes things like the kafka-configs.sh
and other Apache Kafka scripts as well as any other tooling which uses the Admin
client, including Kafka applications themselves.
In the particular case of a hypothetical "Admin server" that was providing CLI or UI functionality in a Strimzi-aware way, it might be possible to use wrap Admin
(decorator pattern), or write it using some abstraction over Admin
which could also support operations on KafkaTopics
for those interactions which would otherwise result in conflicting sources of truth.
Alternatively it might be possible to develop a Kafka-aware proxy to redirect Kafka protocol messages which changed topic state to the Kube API, along with a Topic Operator which had a priviliged connection that was not proxied in this way. This would result in Kube being the effective source of truth for topic state.
This proposal aims to set the direction and define the semantics of the UTO. It is not intended, by itself, to define the complete post-BTO picture. Future work could include:
- Providing tooling to conveniently create
KafkaTopics
for existing topics in Kafka. - Providing tooling to conveniently remove finalizers from
KafkaTopics
This has been investigated, at length, as described above. While it would be possible, technically to support bidirectionality, it's not clear that many users require it in practice. There is very significant complexity in supporting bidirectionality, which would manifest as a long term supportability burden for the Strimzi project. Adopting the UTO allows us to better support the common use case (gitops), and add long sought-after features (support for multiple namespaces).
This has been discounted for the reasons described in the motivation section.