Skip to content

Commit

Permalink
TEP-0138: Improve Tekton Pipeline Feature Flags
Browse files Browse the repository at this point in the history
This commit adds the problem statement for improving Tekton Pipeline Feature
Flags.
  • Loading branch information
JeromeJu committed Jul 10, 2023
1 parent cf8cd54 commit 9f9add6
Showing 1 changed file with 69 additions and 0 deletions.
69 changes: 69 additions & 0 deletions teps/0138-improve-tekton-pipeline-feature-flag.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
status: proposed
title: Improve Tekton Pipeline Feature Flags
creation-date: '2023-07-07'
last-updated: '2023-07-07'
authors:
- '@JeromeJu'
- '@chitrangpatel'
- '@lbernick'
---

# TEP-0138: Improve Tekton Pipeline Feature Flags

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)

## Summary

This document proposes updating Tekton Pipelines' feature flags design, as originally proposed in [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), to address problems related to the coupling API versioning and feature versioning, and to provide a better experience for cluster operators.

## Motivation

Currently there are two design decisions from [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), grouped feature flags and coupled feature version and apiVersion, that led to challenges for cluster operators, users and maintainers for the usage of existent feature flags.

### Grouped feature flags

New features in Tekton start at "`alpha`" stability and can be promoted to ``beta`` or "`stable`". Today, we have a single feature flag, `enable-api-fields`, that enables all fields in the Tekton API supporting features at a certain stability level. `enable-api-fields` can be set to "`alpha`", ``beta``, or "`stable`", specifying the minimum stability level for any fields used in Tekton APIs on that cluster. In addition, features that aren't controlled by fields in Tekton APIs have their own feature flags. For more information, see [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md). This approach has led to the following pain points: 

- Cluster operators can't enable individual `alpha` or `beta` features controlled by API fields.
- Since features at a lower stability level tend to have more bugs, cluster operators and vendors may want to limit usage of lower-stability features they do not need, while still being able to enable individual features for specific use cases.

- Cluster operators can't choose a default stability level for features that are not controlled by API fields.
- Some behavioral feature flags, such as "enforce-nonfalsifiability" has also been gated by `enable-api-fields`, but this forces cluster operators interested in these features to enable 2 flags for a single feature.
- Behavioral feature flags have not been gated by `enable-api-fields`. For these feature flags, cluster operators can't opt into a default stability level. For example, if a cluster operator wants to enable only `stable` features, and maintainers enable a behavioral feature by default when moving it to `beta`, the cluster operator needs to read release notes to be aware of the feature and disable it. 

### Feature stability coupled to CRD API version

Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), the behavior of `enable-api-fields` depends on the CRD API version being used. In v1beta1 CRDs, `beta` features can be enabled by setting `enable-api-fields` to ``beta`` or to "`stable`", but in v1 CRDs, `beta` features can only be enabled by setting `enable-api-fields` to ``beta``. This couples API versioning to feature stability, and has led to the following pain points:

- [Feedback indicates](https://github.com/tektoncd/pipeline/issues/6592#issuecomment-1533268522) that users upgrading their CRDs from v1beta1 to v1 were confused to find `beta` features that worked by default in v1beta1 did not work by default in v1 when `enable-api-fields` was set to "`stable`" (its default value). This is especially confusing for users who are not cluster operators and cannot control the value of `enable-api-fields`, especially if they are not aware they are using `beta` features.

- For maintainers, the maintenance operation of swapping the storage version from v1beta1 to v1 should not have affected our users. However, we had to [change the user-facing default value of enable-api-fields from `stable` to `beta` ](https://github.com/tektoncd/pipeline/pull/6732) before changing the storage version of the API to [avoid breaking PipelineRuns using `beta` features](https://github.com/tektoncd/pipeline/pull/6444#issuecomment-1580926707).

## Goals

- [Decouple API versioning and feature stability](https://github.com/tektoncd/pipeline/issues/6592)
- Feature validation and implementation should be the same for all API versions.
- Cluster operators can enable an individual feature without being forced to opt into other features at the same stability level.
- Increase the visibility of stability levels for features that have their own feature flags and ones that are not tied with the api spec
- Describe a testing strategy that will give us confidence in our implementation
- One of the main reasons why [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md#pros-and-cons) chose the existing style of feature flags was to have a simpler testing matrix. With upgrades to the feature versioning, we need to ensure that our testing strategy gives us confidence in our feature coverage.
- Restructure conditions for integration tests to avoid [Skipped Integration Tests](https://github.com/tektoncd/pipeline/issues/6079) based on different required sets of behavioral feature flags in certain stability levels
- Minimize backwards incompatible changes and provide simple migration paths for cluster operators.
- Cluster operators can set a default feature stability level.
- For example, we want to address the current case where a cluster operator only wants to use `stable` features and we have defaulted ``beta`` as the opt-in features stability then they would have to change back the default feature stability to use all `stable` features by reading through the release notes.

## Non goals

- Better guidance on feature promotion and when features can be promoted
- This is a nice-to-have but not necessarily a blocker, since the feature graduating process should not affect the implementation of how features are enabled.
- Separation of controls for cluster operators and end users
- On a cluster that serves multiple teams, if a cluster operator enables/disables one feature then all the teams are affected. This might break pipelines/tasks for certain teams that rely on a feature. For example, if the cluster operator transitions from `alpha` to `beta` or `stable` features, then the users that are currently relying on `alpha` features can no longer use them. 
- [TEP-0085: Per-Namespace Feature Flags](https://github.com/tektoncd/community/blob/main/teps/0085-per-namespace-controller-configuration.md) is better suited to address this problem. This proposal focuses on the user experience of configuring the Tekton Pipelines features enabled in a Kubernetes cluster..
- Ensure pending resources don't break with changing feature flags on downgrades or upgrades
- As [handling backwards incompatible changes for pending resources](https://github.com/tektoncd/pipeline/issues/6479) pointed out, we have run into the cases where [feature flag info are changed or lost](https://github.com/tektoncd/pipeline/issues/5999) when handling deprecated fields which led the pending resources to break. However, this issue was introduced by the implementation of feature flags rather than its design, and can be addressed separately.
- Users can downgrade their pipeline versions without invalidating stored resources, even if stored resources cannot be run with the downgraded server. Keeping the stored resources valid relates with the storage migration instead of our feature flags implementations, which has been covered in [Storage version migrator v1beta1 -> v1](https://github.com/tektoncd/pipeline/issues/6667) and is out of scope.

0 comments on commit 9f9add6

Please sign in to comment.