Skip to content

Commit

Permalink
TEP-0138: Improve Tekton Pipeline Feature Flags
Browse files Browse the repository at this point in the history
This commit adds the problem statement for improving Tekton Pipeline Feature
Flags.
  • Loading branch information
JeromeJu committed Jul 13, 2023
1 parent cf8cd54 commit 5eeacf5
Show file tree
Hide file tree
Showing 3 changed files with 76 additions and 0 deletions.
2 changes: 2 additions & 0 deletions teps/0033-tekton-feature-gates.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
---
status: implemented
superseded-by:
- TEP-0138
title: Tekton Feature Gates
creation-date: '2020-11-20'
last-updated: '2021-12-16'
Expand Down
73 changes: 73 additions & 0 deletions teps/0138-improve-tekton-pipeline-feature-flags.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
status: proposed
title: Improve Tekton Pipeline Feature Flags
creation-date: '2023-07-07'
last-updated: '2023-07-07'
authors:
- '@JeromeJu'
- '@chitrangpatel'
- '@lbernick'
---

# TEP-0138: Improve Tekton Pipeline Feature Flags

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)

## Summary

This document proposes updating Tekton Pipelines' feature flags design, as originally proposed in [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), to address problems related to the coupling API versioning and feature versioning, and to provide a better experience for cluster operators.

## Motivation

Currently there are two design decisions from [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), grouped feature flags and coupled feature version and apiVersion, that led to challenges for cluster operators, users and maintainers for the usage of existent feature flags.

### Grouped feature flags

New features in Tekton start at "`alpha`" stability and can be promoted to `beta` or "`stable`". Today, we have a single feature flag, `enable-api-fields`, that enables all fields in the Tekton API supporting features at a certain stability level. `enable-api-fields` can be set to "`alpha`", `beta`, or "`stable`", specifying the minimum stability level for any fields used in Tekton APIs on that cluster. In addition, features that aren't controlled by fields in Tekton APIs have their own feature flags. For more information, see [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md). This approach has led to the following pain points: 

- Cluster operators can't enable individual `alpha` or `beta` features controlled by API fields.
- Since features at a lower stability level tend to have more bugs, cluster operators and vendors may want to limit usage of lower-stability features they do not need, while still being able to enable individual features for specific use cases.
- Enabling all alpha features may have led to problems with the usage and debugging of stable and beta features. For example, in [alpha object param feature causes problems for debugging beta resolver features](https://github.com/tektoncd/pipeline/issues/6365) and [alpha propagated object params that has led to the unclear behaviour with opting in all alpha features together](https://github.com/tektoncd/pipeline/issues/5988).

- Some behavioral feature flags can cause confusions due to the inconsistencies in how they are used with group feature flag. For example, the “enforce-nonfalsifiability” is a behavioral flag that has also been gated by “enable-api-fields”, but this forces cluster operators interested in these features to enable 2 flags for a single feature.

### Feature stability coupled to CRD API version

Per [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md), the behavior of `enable-api-fields` depends on the CRD API version being used. In v1beta1 CRDs, `beta` features can be enabled by setting `enable-api-fields` to `beta` or to "`stable`", but in v1 CRDs, `beta` features can only be enabled by setting `enable-api-fields` to `beta`. This couples API versioning to feature stability, and has led to the following pain points:

- [Feedback indicates](https://github.com/tektoncd/pipeline/issues/6592#issuecomment-1533268522) that users upgrading their CRDs from v1beta1 to v1 were confused to find `beta` features that worked by default in v1beta1 did not work by default in v1 when `enable-api-fields` was set to "`stable`" (its default value). This is especially confusing for users who are not cluster operators and cannot control the value of `enable-api-fields`, especially if they are not aware they are using `beta` features.

- For maintainers, the maintenance operation of swapping the storage version from v1beta1 to v1 should not have affected our users. However, we had to [change the user-facing default value of enable-api-fields from `stable` to `beta` ](https://github.com/tektoncd/pipeline/pull/6732) before changing the storage version of the API to [avoid breaking PipelineRuns using `beta` features](https://github.com/tektoncd/pipeline/pull/6444#issuecomment-1580926707).

- When promoting features, it could cause confusions for contributors to be dependent on the fact whether an apiVersion is available. For example, during [the promotion to beta for projected workspaces](https://github.com/tektoncd/pipeline/pull/5530), the v1 api's existence led to confusions of what to do with beta features in v1beta1 and its difference with in v1.

## Goals

- [Decouple API versioning and feature stability](https://github.com/tektoncd/pipeline/issues/6592)
- Feature validation and implementation should be the same for all API versions.
- Cluster operators can enable an individual feature without being forced to opt into other features at the same stability level.
- Increase the visibility of stability levels for features that have their own feature flags and ones that are not tied with the api spec
- Describe a testing strategy that will give us confidence in our implementation
- One of the main reasons why [TEP-0033](https://github.com/tektoncd/community/blob/main/teps/0033-tekton-feature-gates.md#pros-and-cons) chose the existing style of feature flags was to have a simpler testing matrix. With upgrades to the feature versioning, we need to ensure that our testing strategy gives us confidence in our feature coverage.
- Restructure conditions for integration tests to avoid [Skipped Integration Tests](https://github.com/tektoncd/pipeline/issues/6079) based on different required sets of behavioral feature flags in certain stability levels
- Minimize backwards incompatible changes and provide simple migration paths for cluster operators.
- Cluster operators can set a default feature stability level.
- For example, we want to address the current case where a cluster operator only wants to use `stable` features and we have defaulted `beta` as the opt-in features stability then they would have to change back the default feature stability to use all `stable` features by reading through the release notes.
- Build consistency in the usage for the group feature flag and the behavioral feature flags.

## Non goals

- Better guidance on feature promotion and when features can be promoted
- This is a nice-to-have but not necessarily a blocker, since the feature graduating process should not affect the implementation of how features are enabled.
- Separation of controls for cluster operators and end users
- On a cluster that serves multiple teams, if a cluster operator enables/disables one feature then all the teams are affected. This might break pipelines/tasks for certain teams that rely on a feature. For example, if the cluster operator transitions from `alpha` to `beta` or `stable` features, then the users that are currently relying on `alpha` features can no longer use them. 
- [TEP-0085: Per-Namespace Feature Flags](https://github.com/tektoncd/community/blob/main/teps/0085-per-namespace-controller-configuration.md) is better suited to address this problem. This proposal focuses on the user experience of configuring the Tekton Pipelines features enabled in a Kubernetes cluster..
- Ensure pending resources don't break with changing feature flags on downgrades or upgrades
- As [handling backwards incompatible changes for pending resources](https://github.com/tektoncd/pipeline/issues/6479) pointed out, we have run into the cases where [feature flag info are changed or lost](https://github.com/tektoncd/pipeline/issues/5999) when handling deprecated fields which led the pending resources to break. However, this issue was introduced by the implementation of feature flags rather than its design, and can be addressed separately.
- Users can downgrade their pipeline versions without invalidating stored resources, even if stored resources cannot be run with the downgraded server. Keeping the stored resources valid relates with the storage migration instead of our feature flags implementations, which has been covered in [Storage version migrator v1beta1 -> v1](https://github.com/tektoncd/pipeline/issues/6667) and is out of scope.
- Strict control of behavioral flags for a specific stability level using a group flag
- The [review of the existing behavioral flags in Tekton Pipelines](https://docs.google.com/document/d/11F_UU2ZSQoDMci87sFxxJZeFJWXzaapZKLl_oT9M4BY) suggests that it is neither necessary nor possible to gate all behavioral features behind a certain stability level, as not all features gated by a behavioral flag are on the same stability level. For example, `coschedule` flag has four different values, two of which are ‘stable’ and two of which are ‘alpha’.
1 change: 1 addition & 0 deletions teps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,3 +125,4 @@ This is the complete list of Tekton TEPs:
|[TEP-0133](0133-configure-default-resolver.md) | Configure Default Resolver | implemented | 2023-03-21 |
|[TEP-0134](0134-concise-pipelines.md) | Concise Pipelines | proposed | 2023-04-28 |
|[TEP-0135](0135-coscheduling-pipelinerun-pods.md) | Coscheduling PipelineRun pods | implementable | 2023-06-22 |
|[TEP-0138](0138-improve-tekton-pipeline-feature-flags.md) | Improve Tekton Pipeline Feature Flags | proposed | 2023-07-07 |

0 comments on commit 5eeacf5

Please sign in to comment.