[main] Add flag to control CRD update features #114

mallardduck · 2024-10-24T21:27:56Z

Related Issue: rancher/rancher#46764 / SURE-8872

Per the title, this PR adds features that allow users to opt-out of CRD updates and disable helm-controller CRDs on k3s/rke2.

In general, if a user opts out of the CRD updates (via .crdManagement.update: false) then they will be responsible for either:
a) manually applying the expected CRD updates, or
b) they must enable the CRD updates value for a short period of time then re-disable it.

For the CRDs that we inherit from helm-controller users can opt-out of PromFed managing them at all. In that case, the CRDs would be managed by helm-controller. However due to changes in helm-controller there can be multiple operators interacting with them without causing issues. They will only actually act on the CRD resources with annotations citing them (the controller in question) as the manager.

The description of the original issue isn't very accurate today as it used to be (i.e. using recent k3s/rke2 versions things work differently). However this PR seeks to address the root concern of the issues even if the conditions of modern distros are different.

New values added:

# Configure how the operator will manage the CustomResourceDefinitions (CRDs) it needs to function.
crdManagement:
  # Enable or disable automatic updates of CRDs during startup.
  # When true, all CRDs will be udpated to the version the operator provides.
  # When false, only missing CRDs will be installed, and existing ones will not be updated.
  update: true
  # Specify whether the operator should detect K3s and RKE2 clusters and exclude `helm-controller` CRDs from management.
  # When true, `helm-controller` CRDs will not be managed by the operator in these environments, as K3s/RKE2 handle them internally.
  # When false, the operator will manage `helm-controller` CRDs regardless of the runtime environment.
  detectK3sRke2: false

These defaults will ship to provide the existing functionality out of the box. This preserves the functionality many users would expect today so as to not create issues for them due to un-expected configuration changes.

For users on k3s/RKE2 they may want to change detectK3sRke2 to true for the most simple experience out of the box. However, they can go as far as also disabling CRD updates, and/or the PromFed shipped helm-controller. When using these options, they'll need to ensure proper configurations so that the cluster built-in helm-controller will manage the Charts for PromFed.

Future plans

Once we have a more complete frameworks level solution for tracking CRD ownership, we can immediately replace the k3s/rke2 specific logic to use that instead. (And remove added helpers that check the nodes) Then we can simultaneously deprecate and remove the chart values as they will have no affect nor will break anything if left with the methods used to implement the feature here.

github-actions · 2024-10-25T17:49:10Z

Images built for PR #114:

Helm Locker: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-locker:pr-114 ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-4bc6a44
Helm Project Operator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114 ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-4bc6a44
Prometheus Federator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator:pr-114 ghcr.io/rancher/prometheus-federator:pr-114-4bc6a44

github-actions · 2024-10-25T21:31:36Z

Images built for PR #114:

Helm Locker: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-locker:pr-114 ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-4bc6a44
Helm Project Operator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114 ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-4bc6a44
Prometheus Federator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator:pr-114 ghcr.io/rancher/prometheus-federator:pr-114-4bc6a44

mallardduck · 2024-10-28T15:26:42Z

So with this fix in place, if you create a cluster (using k3s/rke2) and then later install the chart you should see CRDs with 2 sets of timestamps. One for the cluster creation and one for the helm chart install time - the helmcharts and helmchartconfigs will be from creation time. Like:

helmchartconfigs.helm.cattle.io                                   2024-10-28T00:53:16Z
helmcharts.helm.cattle.io                                         2024-10-28T00:53:16Z
helmreleases.helm.cattle.io                                       2024-10-28T15:23:18Z
projecthelmcharts.helm.cattle.io                                  2024-10-28T15:23:18Z

Actually, while this seems like an improvement and the correct behavior...I'm not currently able to replicate this at all on 2.9.2 with 104.0.0+up0.4.0 which should be the same as the reported prometheus-federator-103.0.2+up0.4.0 which has the issue.

I guess I'll try to setup 2.8 cluster and see if I can even replicate the original issue.

Oh right - the creation date is what we're seeing and that doesn't change. So I should have compared status fields to check update at times. Currently rebuilding lab to test on 2.8 anyways but I'll remember that.

mallardduck · 2024-10-29T15:44:40Z

Summary of Helm-Controller Situation:

Helm Controller	Release Date	CRD Changes (Compared to Current)	RKE2	K3S	PromFed	PromFed Update Safety
v0.13.1	Nov 23, 2022	N/A	v1.26.1+rke2r1, v1.25.6+rke2r1	v1.26.3+k3s1, v1.25.8+k3s1	Current	N/A
v0.13.3	Apr 4, 2023	False	N/A	v1.27.1+k3s1, v1.26.4+k3s1, v1.25.9+k3s1	N/A	V Low Risk
v0.14.0	May 10, 2023	True	(skipped, not helpful info)	(skipped, not helpful info)	N/A	Higher Risk
v0.15.4	Aug 15, 2023	True	v1.29.4+rke2r1, v1.28.9+rke2r1, v1.27.13+rke2r1, v1.26.12+rke2r1, v1.25.16+rke2r2	v1.29.0+k3s1, v1.28.5+k3s1, v1.27.9+k3s1, v1.26.12+k3s1, v1.25.16+k3s4	N/A	Higher Risk
v0.16.4	Sep 5, 2024	True; even more	v1.30.5+rke2r1	v1.30.5+k3s1	N/A	V High Risk

Note that I've been testing with 1.28.14 which matches up with v0.15.13 of helm-controller. And as noted in the table, v0.13.1 of helm-controller currently lines up with the one PromFed ships.

More info on CRD differences: https://gist.github.com/mallardduck/120b1d5d833126ae2cb8ff2df20c53c5

github-actions · 2024-12-09T20:47:12Z

Images built for PR #114:

Helm Locker: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-locker:pr-114 ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-0ecd192
Helm Project Operator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114 ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-0ecd192
Prometheus Federator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator:pr-114 ghcr.io/rancher/prometheus-federator:pr-114-0ecd192

github-actions · 2024-12-09T21:26:18Z

Images built for PR #114:

Helm Locker: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-locker:pr-114 ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-0ecd192
Helm Project Operator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114 ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-0ecd192
Prometheus Federator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator:pr-114 ghcr.io/rancher/prometheus-federator:pr-114-0ecd192

github-actions · 2024-12-10T17:44:51Z

Images built for PR #114:

Helm Locker: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-locker:pr-114 ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-0ecd192
Helm Project Operator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114 ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-0ecd192
Prometheus Federator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator:pr-114 ghcr.io/rancher/prometheus-federator:pr-114-0ecd192

github-actions · 2024-12-10T18:59:00Z

Images built for PR #114:

Helm Locker: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-cb3f41e ghcr.io/rancher/prometheus-federator/helm-locker:pr-114
Helm Project Operator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-cb3f41e ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114
Prometheus Federator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator:pr-114-cb3f41e ghcr.io/rancher/prometheus-federator:pr-114

github-actions · 2024-12-10T19:21:57Z

Images built for PR #114:

Helm Locker: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-d5b5c32 ghcr.io/rancher/prometheus-federator/helm-locker:pr-114
Helm Project Operator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-d5b5c32 ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114
Prometheus Federator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator:pr-114-d5b5c32 ghcr.io/rancher/prometheus-federator:pr-114

github-actions · 2024-12-10T21:01:37Z

Images built for PR #114:

Helm Locker: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-d5b5c32 ghcr.io/rancher/prometheus-federator/helm-locker:pr-114
Helm Project Operator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-d5b5c32 ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114
Prometheus Federator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator:pr-114-d5b5c32 ghcr.io/rancher/prometheus-federator:pr-114

github-actions · 2024-12-10T21:26:29Z

Images built for PR #114:

Helm Locker: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-39a577d ghcr.io/rancher/prometheus-federator/helm-locker:pr-114
Helm Project Operator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-39a577d ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114
Prometheus Federator: Link to image:
Tags: ghcr.io/rancher/prometheus-federator:pr-114-39a577d ghcr.io/rancher/prometheus-federator:pr-114

github-actions · 2024-12-11T17:01:54Z

Images built for PR #114:

Helm Locker - Tags:
- ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-67babd8 ghcr.io/rancher/prometheus-federator/helm-locker:pr-114
Helm Project Operator - Tags:
- ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-67babd8 ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114
Prometheus Federator - Tags:
- ghcr.io/rancher/prometheus-federator:pr-114-67babd8 ghcr.io/rancher/prometheus-federator:pr-114

Helm Debug Values:

  image:
    pullPolicy: Always
    registry: ghcr.io
    repository: rancher/prometheus-federator
    tag: pr-114-67babd8

For now we indirectly rely on checking nodes for k8s runtime type, then based on if we find k3s/rke2 we skip them. In the future, we'd like to directly be able to query the CRD on the cluster and see a clear annotation/label citing the CRDs owner.

…er value

github-actions · 2024-12-11T23:01:50Z

Images built for PR #114:

Helm Locker - Tags:
- ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-94d7ddf ghcr.io/rancher/prometheus-federator/helm-locker:pr-114
Helm Project Operator - Tags:
- ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-94d7ddf ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114
Prometheus Federator - Tags:
- ghcr.io/rancher/prometheus-federator:pr-114-94d7ddf ghcr.io/rancher/prometheus-federator:pr-114

Helm Debug Values:

  image:
    pullPolicy: Always
    registry: ghcr.io
    repository: rancher/prometheus-federator
    tag: pr-114-94d7ddf

alexandreLamarre

The other thing we need to start doing is adding behavioral tests in Go for every code change, establishing clear expected behavior in something other than just documentation and having CI catch it.

This PR should include those tests as well and I'm happy to sync up about those.

alexandreLamarre · 2024-12-12T19:52:54Z

pkg/helm-project-operator/crd/crds.go

+func shouldManageHelmControllerCRDs(cfg *rest.Config) bool {
+	if os.Getenv("DETECT_K3S_RKE2") != "true" {
+		logrus.Debug("k3s/rke2 detection feature is disabled; `helm-controller` CRDs will be managed")
+		return true
+	}
+
+	k8sRuntimeType, err := identifyKubernetesRuntimeType(cfg)
+	if err != nil {
+		logrus.Error(err)
+	}
+
+	onK3sRke2 := k8sRuntimeType == "k3s" || k8sRuntimeType == "rke2"
+	if onK3sRke2 {
+		logrus.Debug("the cluster is running on k3s (or rke2), `helm-controller` CRDs will not be managed")
+	}
+
+	return !onK3sRke2
+}
+
+func identifyKubernetesRuntimeType(clientConfig *rest.Config) (string, error) {
+	client, err := clients.NewFromConfig(clientConfig, nil)
+	if err != nil {
+		return "", err
+	}
+
+	nodes, err := client.Core.Node().List(metav1.ListOptions{})
+	if err != nil {
+		logrus.Fatalf("Failed to list nodes: %v", err)
+	}
+	instanceTypes := make(map[string]int)
+	for _, node := range nodes.Items {
+		instanceType, exists := node.Labels["node.kubernetes.io/instance-type"]
+		if exists {
+			instanceTypes[instanceType]++
+		} else {
+			logrus.Debugf("Cannot find `node.kubernetes.io/instance-type` label on node `%s`", node.Name)
+		}
+	}
+
+	if len(instanceTypes) == 0 {
+		return "", errors.New("cannot identify k8s runtime type; no nodes in cluster have expected label")
+	}
+
+	var k8sRuntimeType string
+	for instanceType := range instanceTypes {
+		k8sRuntimeType = instanceType
+		break
+	}
+
+	return k8sRuntimeType, nil


Going back up to the previous discussion, if we can't detect owners of crds from other methods, then we shouldn't have auto detect logic at all.

It can introduce many incompatibilities across many versions - does it work on older rke2 and k3s versions? will it work going forward? How will we use a test matrix to ensure it always works? It's what we call an out of band dependency and those should generally be avoided if possible, because they introduce unnecessary complexity both in code and in this instance especially in testing.

I'm almost certain that's also why the original controller has a toggle to use an embedded controller or not, rather than detect specific instances. In this case, I'd heavily consider a toggle to just filter CRD updates and omit detection logic altogether.

If we were to introduce distribution specific behavior, then the existing place where rancher and rancher observability sort of handle that is in the charts themselves, with distribution specific resource definitions - which in my mind is also overkill to make work in this specific instance

alexandreLamarre

Agreed to leave auto detection as an experimental feature and to omit tests to move it to prod asap

mallardduck · 2024-12-12T21:18:20Z

pkg/helm-project-operator/experimental/runtime.go

+)
+
+// IdentifyKubernetesRuntimeType provides the k8s runtime used on nodes in the cluster.
+// Deprecated: This feature is a stop gap not expected to be maintained long-term.


Figured we just go for it and deprecate from day 1 even tho it feels weird.

Big picture it feels better since it's a clear signal that this method isn't intended to be used long term. Nor something that needs to be maintained once a better mechanism to get the same information is in place.

mallardduck · 2024-12-12T21:19:28Z

packages/prometheus-federator/charts/values.yaml

+    # When false, only missing CRDs will be installed, and existing ones will not be updated.
+    update: true
+
+    # !! EXPERIMENTAL OPTION !! - Use this feature with caution and careful consideration.


Similarly warn end-users that this value is experimental and to use with care. Ultimately we added it in a way that won't cause issues if we eventually remove it, or even if customers use unexpected combinations of PromFed chart and PromFed images.

mallardduck · 2024-12-12T21:20:11Z

pkg/helm-project-operator/crd/crds.go

 		}
 	}
 	return
 }

 // List returns the list of CRDs and dependent CRDs for this operator
-func List() ([]crd.CRD, []crd.CRD) {
+func List() ([]crd.CRD, []crd.CRD, []crd.CRD) {
+	// TODO: The underlying crd.CRD is deprectated and will eventually be removed


Leave future TODO to remind us that this crd.CRD is deprecated.

mallardduck · 2024-12-12T21:20:43Z

pkg/helm-project-operator/crd/crds.go

+		return true
+	}
+
+	// TODO: In the future, this should not rely on detecting k8s runtime type


Finally very clear todo here, to make it even more clear this will be removed in the future.

github-actions · 2024-12-12T22:37:47Z

Images built for PR #114:

Helm Locker - Tags:
- ghcr.io/rancher/prometheus-federator/helm-locker:pr-114-2cff460 ghcr.io/rancher/prometheus-federator/helm-locker:pr-114
Helm Project Operator - Tags:
- ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114-2cff460 ghcr.io/rancher/prometheus-federator/helm-project-operator:pr-114
Prometheus Federator - Tags:
- ghcr.io/rancher/prometheus-federator:pr-114-2cff460 ghcr.io/rancher/prometheus-federator:pr-114

Helm Debug Values:

  image:
    pullPolicy: Always
    registry: ghcr.io
    repository: rancher/prometheus-federator
    tag: pr-114-2cff460

mallardduck force-pushed the crd-install-bugfix branch from bc8034c to fdeee11 Compare October 25, 2024 17:01

mallardduck changed the title ~~Crd install bugfix~~ [main] Fix CRDs being installed even when they already exist and don't need updating Oct 25, 2024

mallardduck marked this pull request as ready for review October 25, 2024 21:26

mallardduck requested a review from a team as a code owner October 25, 2024 21:26

mallardduck requested review from joshmeranda and jbiers October 25, 2024 21:26

mallardduck force-pushed the crd-install-bugfix branch from ed9b007 to 2077d30 Compare November 14, 2024 19:19

mallardduck force-pushed the crd-install-bugfix branch from 2077d30 to f6babbd Compare December 9, 2024 19:43

mallardduck force-pushed the crd-install-bugfix branch 2 times, most recently from d0e648b to 29374cc Compare December 9, 2024 21:08

mallardduck changed the title ~~[main] Fix CRDs being installed even when they already exist and don't need updating~~ [main] Add flag to control CRD update features Dec 9, 2024

mallardduck requested review from alexandreLamarre and removed request for joshmeranda December 9, 2024 21:15

mallardduck force-pushed the crd-install-bugfix branch from dd1bfcc to cb3f41e Compare December 10, 2024 18:38

mallardduck force-pushed the crd-install-bugfix branch from cb3f41e to d5b5c32 Compare December 10, 2024 19:05

mallardduck force-pushed the crd-install-bugfix branch 2 times, most recently from ee50dc1 to 39a577d Compare December 10, 2024 21:13

mallardduck force-pushed the crd-install-bugfix branch from 39a577d to 67babd8 Compare December 11, 2024 16:51

mallardduck added 6 commits December 11, 2024 17:13

Add debug logging for CRD install step

b408834

Add dev script for running helm-project-operator e2e locally

e6361f8

go mod tidy && goimports -l -w

aad1df7

Add MANAGE_CRD_UPDATES env to HPO/PromFed

6aa9a9b

bump prom-fed chart version rc

f5190f1

Add update CRDs to values file

8ef08a6

mallardduck force-pushed the crd-install-bugfix branch from e4c5df2 to 05245f8 Compare December 11, 2024 22:13

mallardduck added 4 commits December 11, 2024 17:18

make charts

89263ea

Adjust HelmChart managedBy value to respect the disable Helm Controll…

d36110d

…er value

golang lint

54fef57

mallardduck force-pushed the crd-install-bugfix branch from 05245f8 to 54fef57 Compare December 11, 2024 22:18

make charts

94d7ddf

mallardduck requested review from alexandreLamarre December 12, 2024 16:55

mallardduck added 3 commits December 12, 2024 14:01

Correct chart appVersion

53dab36

Add tag back to PromFed chart for rancher-charts image tooling

104c3d3

make charts

9283a3d

alexandreLamarre requested changes Dec 12, 2024

View reviewed changes

alexandreLamarre approved these changes Dec 12, 2024

View reviewed changes

mallardduck added 3 commits December 12, 2024 16:11

Refactor temporary fix to experimental module

88bc188

Improve values comments on experimental setting

ba05476

make charts

2cff460

mallardduck force-pushed the crd-install-bugfix branch from 7ed8290 to 2cff460 Compare December 12, 2024 21:15

mallardduck commented Dec 12, 2024

View reviewed changes

mallardduck merged commit 9f57328 into rancher:main Dec 13, 2024
17 checks passed

mallardduck deleted the crd-install-bugfix branch December 13, 2024 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[main] Add flag to control CRD update features #114

[main] Add flag to control CRD update features #114

mallardduck commented Oct 24, 2024 •

edited

Loading

github-actions bot commented Oct 25, 2024

github-actions bot commented Oct 25, 2024

mallardduck commented Oct 28, 2024 •

edited

Loading

mallardduck commented Oct 29, 2024

github-actions bot commented Dec 9, 2024

github-actions bot commented Dec 9, 2024

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 11, 2024

github-actions bot commented Dec 11, 2024

alexandreLamarre left a comment

alexandreLamarre Dec 12, 2024

alexandreLamarre Dec 12, 2024

alexandreLamarre left a comment

mallardduck Dec 12, 2024

mallardduck Dec 12, 2024

mallardduck Dec 12, 2024

mallardduck Dec 12, 2024

github-actions bot commented Dec 12, 2024

[main] Add flag to control CRD update features #114

[main] Add flag to control CRD update features #114

Conversation

mallardduck commented Oct 24, 2024 • edited Loading

Related Issue: rancher/rancher#46764 / SURE-8872

Future plans

github-actions bot commented Oct 25, 2024

Images built for PR #114:

github-actions bot commented Oct 25, 2024

Images built for PR #114:

mallardduck commented Oct 28, 2024 • edited Loading

mallardduck commented Oct 29, 2024

github-actions bot commented Dec 9, 2024

Images built for PR #114:

github-actions bot commented Dec 9, 2024

Images built for PR #114:

github-actions bot commented Dec 10, 2024

Images built for PR #114:

github-actions bot commented Dec 10, 2024

Images built for PR #114:

github-actions bot commented Dec 10, 2024

Images built for PR #114:

github-actions bot commented Dec 10, 2024

Images built for PR #114:

github-actions bot commented Dec 10, 2024

Images built for PR #114:

github-actions bot commented Dec 11, 2024

Images built for PR #114:

Helm Debug Values:

github-actions bot commented Dec 11, 2024

Images built for PR #114:

Helm Debug Values:

alexandreLamarre left a comment

Choose a reason for hiding this comment

alexandreLamarre Dec 12, 2024

Choose a reason for hiding this comment

alexandreLamarre Dec 12, 2024

Choose a reason for hiding this comment

alexandreLamarre left a comment

Choose a reason for hiding this comment

mallardduck Dec 12, 2024

Choose a reason for hiding this comment

mallardduck Dec 12, 2024

Choose a reason for hiding this comment

mallardduck Dec 12, 2024

Choose a reason for hiding this comment

mallardduck Dec 12, 2024

Choose a reason for hiding this comment

github-actions bot commented Dec 12, 2024

Images built for PR #114:

Helm Debug Values:

mallardduck commented Oct 24, 2024 •

edited

Loading

mallardduck commented Oct 28, 2024 •

edited

Loading