Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenFeature integration #131

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

see-quick
Copy link
Member

@see-quick see-quick commented Sep 12, 2024

This proposal introduces the potential integration of the Strimzi Kafka operator with the feature flagging system OpenFeature FlagD. Motivation comes from [1].

[1] - strimzi/strimzi-kafka-operator#7520

Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
@see-quick see-quick added the enhancement New feature or request label Sep 12, 2024
@see-quick see-quick self-assigned this Sep 12, 2024
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. But apart from them:

  • I think the abiility to configure the feature gates per-cluster si much more important than the ability to configure the feautre gates dynamically.
  • I don't understand the focus on Flagd. My assumption was that OpenFeature provides us a generic API for feature flagging that works with many existing systems and allows users to bring their own implementations as well. This PR reads as if it is not the case - should we in that case wait for some actual demand for Flagd first? I think it is fine to use Flagd as an example of how it might work with the other systems. But the proposal seems to hardcode it in.
  • I would focus less on including long code segments and more on the actual impact show in shorter examples and/or explained with words.
  • The backward compatibility is not clear to me.

Comment on lines 18 to 22
For the current users (i.e., having classic ENV variable STRIMZI_FEATURE_GATES), we will use [Enviroment Variable Provider](https://github.com/open-feature/java-sdk-contrib/tree/main/providers/env-var)
already implemented by OpenFeature folks. Some of the common characteristics of such provider are:
1. Default method currently used.
2. Feature gates are set via environment variables.
3. Requires rolling updates for changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You link to an existing provider. But how does it provide the backwards compatibility for how we configure the feature gates?

Copy link
Member Author

@see-quick see-quick Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply by using just STRIMZI_FEATURE_GATES env var in all operators as we do it currently (but such ENV would be moved to FeatureGates class).

084-open-feature-integration.md Outdated Show resolved Hide resolved
Comment on lines 27 to 32
On the other side for the more complex users with need of flagging system, we would support of [FlagD](https://github.com/open-feature/flagd),
which is pretty known because [OpenFeature is CNCF Incubating project](https://www.cncf.io/projects/openfeature/).
Common characteristics of this provider are:
1. New method proposed for more complex users, which needs better management over multiple features (of Strimzi)
2. Integrates with FlagD for dynamic feature flagging
3. Allows changes without the need for rolling updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the way it would work is that:

  • We would have our own default provider for how we configure the feature gates
  • Users can use any of the provider (and while we might bundle some, they might also bundle their own)

But you make it sound here like we need to explicitly pick that we are gonna support Flagd?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it was the easiest option to do it at the start. But I agree it would be better to let for instance in

private static final String FLAGD_ENABLED_ENV_VAR = "FLAGD_ENABLED"; // Environment variable to toggle FlagD

instead of FLAGD_ENABLED here use FlagProvider name and in our code we could add multiple providers based on that ENV-VAR. Here is the list, which is supported by OpenFeature [1]. So there are plenty of them.

[1] - https://mvnrepository.com/artifact/dev.openfeature.contrib.providers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we can have possibly:

/**
     * Maps the value of OPEN_FEATURE_PROVIDER_NAME to the corresponding provider.
     *
     * @return The corresponding FeatureProvider instance based on the environment variable.
     */
    private FeatureProvider getProviderFromEnv() throws InvalidOptions {
        String providerName = System.getenv(OPEN_FEATURE_PROVIDER_NAME_ENV);

        // Default to EnvVarProvider if the environment variable is not set
        if (providerName == null || providerName.trim().isEmpty()) {
            return new EnvVarProvider();
        }

        // Create a mapping between the environment variable and providers
        Map<String, FeatureProvider> providerMap = new HashMap<>();
        providerMap.put("flagd", new FlagdProvider());
        providerMap.put("env-var", new EnvVarProvider());
        providerMap.put("flagsmith", new FlagsmithProvider());
        providerMap.put("configcat", new ConfigCatProvider());
        providerMap.put("statsig", new StatsigProvider());
        providerMap.put("unleash", new UnleashProvider());
        providerMap.put("jsonlogic", new JsonlogicProvider());
        providerMap.put("flipt", new FliptProvider());
        providerMap.put("go-feature-flag", new GoFeatureFlagProvider());

        // Return the corresponding provider or default to EnvVarProvider
        return providerMap.getOrDefault(providerName.trim().toLowerCase(), new EnvVarProvider());
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then everything would be handled by the OpenFeatureAPI Client for every provider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so adding new provider will require code change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but currently OpenFeature has only these providers, which I have listed. So I think it would be just a one-timer change. If in the future a new flagging system comes into the picture (of course then we would need to add it :))

084-open-feature-integration.md Outdated Show resolved Hide resolved
084-open-feature-integration.md Outdated Show resolved Hide resolved
084-open-feature-integration.md Outdated Show resolved Hide resolved
084-open-feature-integration.md Outdated Show resolved Hide resolved
084-open-feature-integration.md Show resolved Hide resolved
@see-quick
Copy link
Member Author

I think the abiility to configure the feature gates per-cluster si much more important than the ability to configure the feautre gates dynamically.

Okay, I will think about how we can achieve it and update the proposal.

I don't understand the focus on Flagd. My assumption was that OpenFeature provides us a generic API for feature flagging that works with many existing systems and allows users to bring their own implementations as well. This PR reads as if it is not the case - should we in that case wait for some actual demand for Flagd first? I think it is fine to use Flagd as an example of how it might work with the other systems. But the proposal seems to hardcode it in.

You are correct OpenFeature provides a generic API for feature flagging I was thinking that we would start with just FlagD support. But looking at this again it's better to let the user decide what provider he wants to use...So I will update the proposal accordingly to be more generic and not focused only on FlagD.

Signed-off-by: see-quick <maros.orsak159@gmail.com>
…re gates configuration

Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Copy link
Member

@tombentley tombentley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @see-quick, I left a few comments. Right now although I can see some value I'm not completely convinced the advantages outweigh the disadvantages.

| |
| <----------- API Calls -----------> |

Where in each component (i.e., ClusterOperator, UserOperator and TopicOperator), we will fetch feature flags dynamically from the OpenFeature API, which is managed by feature flagging server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the communication to the server fails? I guess we're unable to proceed with the reconciliation. Thus the availability of the Strimzi operators becomes coupled to the availability of this feature flagging server. I don't think that's a problem for the proposal, but calling out explicitly for users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the communication to the server fails? I guess we're unable to proceed with the reconciliation. Thus the availability of the Strimzi operators becomes coupled to the availability of this feature flagging server

Well this can be handled I think because we would end up in a scenario where:

2024-09-27 09:10:18 ERROR GrpcConnector:149 - failed to connect to event stream, exhausted retries

client (strimzi) wants to reconnect because he is getting a GrpcConnector error. We can catch that and use just the default value of FEATURE_GATE; therefore, we no longer depend on feature flagging server availability. And if such error would disappear (feature flagging system is up) we continue with a normal state.

- if:
- or:
- and:
- "==":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this and operator only has a single operand, which is weird. Is this a mistake/bad example, or is it imposed by the CRD validation schema for the FeatureFlag kind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a mistake/bad example. The correct should be (locally I have fixed in my example files but didn't update proposal thanks):

targeting:
          if:
            - or:
                - "==":
                   - var: clusterName
                   - "kafka-cluster-a"
                - "==":
                   - var: clusterName
                   - "kafka-cluster-b"
            - "on"
            - "off"


### Potential Challenges

- **Complexity:** Increased complexity in configuration management.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the big one for me. If there were lots of users asking for the functionality that open feature provides then it's much easier to justify the complexity. But AFAIK no one is asking for this. That means the benefits are really internal at this point (the testing benefit).


- **Flexibility:** Users can toggle features without redeploying or restarting services.
- **Faster Iterations/Testing:** Features can be tested and rolled out quickly, speeding up development cycles.
- **Centralized Management:** feature flagging system integration allows centralized control of feature flags, simplifying management across multiple components.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked closely at OpenFeature, but externalising the logic used for turning feature gates on and off could be a double edged sword. In particular, if I'm editing a FeatureFlag how do I know what will be affected? It could be 'nothing', or it could be 'everything' (it's not even limited at just being Strimzi things AFAICS). Does Open Feature help in figuring out the impact, or is it up to the user to piece it together?

I also worry a bit about discoverability: When feature flags are part of the operator deployment it's pretty obvious. When they're in some other resource then you need to know about the existence of FeatureFlag (as a thing), as well as discovering which instances of it (could there be multiple ones?) the operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked closely at OpenFeature, but externalising the logic used for turning feature gates on and off could be a double edged sword. In particular, if I'm editing a FeatureFlag how do I know what will be affected? It could be 'nothing', or it could be 'everything' (it's not even limited at just being Strimzi things AFAICS). Does Open Feature help in figuring out the impact, or is it up to the user to piece it together?

It up to the user to piece it together. little offtopic: Maybe one option would have some examples in DOCs but I am not sure about their maintainability in terms that there are plenty of providers (flagd is just one of them). Another point is that OpenFeature Operator supports just FlagD. So other providers would need to run in some form of Deployment and they all have different configurations options.

FeatureFlag (as a thing), as well as discovering which instances of it (could there be multiple ones?) the operator

Yeah you can deploy multiple of them AFAIK. Firstly you have to define FeatureFlagSource e.g.,

apiVersion: core.openfeature.dev/v1beta1
kind: FeatureFlagSource
metadata:
 name: strimzi-feature-gates-source
 namespace: myproject
 labels:
   app: open-feature-demo
spec:
 sources:
   - source: strimzi-feature-gates-flags
     provider: kubernetes

And in source you have to reference your created FeatureFlag CR (i.e., strimzi-feature-gates-flags). But again this is just for FlagD and in case of other provider you would need to configure it differently (and without use of Operator cause their operator for now supports just FlagD as feature flagging server).

Signed-off-by: see-quick <maros.orsak159@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants