Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--feature-gate is no longer present in csi-snapshotter causing container to fail. #15221

Closed
DeprecatedLuke opened this issue Dec 28, 2024 · 11 comments
Labels

Comments

@DeprecatedLuke
Copy link

DeprecatedLuke commented Dec 28, 2024

Is this a bug report or feature request?

  • Bug Report

The '--feature-gate=CSIVolumeGroupSnapshot=true' is passed into csi-cephfs-plugin provisioner and csi-rbdplugin-provisioner in release v1.16, appears to be related to container-storage-interface/spec#573.

The container will not start until the flag is removed from the deployment. However, the documentation is inconsistent since https://github.com/kubernetes-csi/external-snapshotter still claims that --feature-gate flag is required. (kubernetes-csi/external-snapshotter#1223)

Tested versions:

  • registry.k8s.io/sig-storage/csi-snapshotter:v8.2.0
  • registry.k8s.io/sig-storage/csi-snapshotter:v8.0.1 (default)

Looking more into it, appears to be a typo. --feature-gate vs --feature-gates


flag provided but not defined: -feature-gate
Usage of /csi-snapshotter:
-add_dir_header
If true, adds the file directory to the header of the log messages
-alsologtostderr
log to standard error as well as files (no effect when -logtostderr=true)
-csi-address string
Address of the CSI driver socket. (default "/run/csi/socket")
-extra-create-metadata
If set, add snapshot metadata to plugin snapshot requests as parameters.
-feature-gates value
Comma-seprated list of key=value pairs that describe feature gates for alpha/experimental features. Options are:
AllAlpha=true|false (ALPHA - default=false)
AllBeta=true|false (BETA - default=false)
CSIVolumeGroupSnapshot=true|false (BETA - default=false)
-groupsnapshot-name-prefix string
Prefix to apply to the name of a created group snapshot (default "groupsnapshot")
-groupsnapshot-name-uuid-length int
Length in characters for the generated uuid of a created group snapshot. Defaults behavior is to NOT truncate. (default -1)
-http-endpoint :8080
The TCP network address where the HTTP server for diagnostics, including metrics and leader election health check, will listen (example: :8080). The default is empty string, which means the server is disabled. Only one of `--metrics-address` and `--http-endpoint` can be set.
-kube-api-burst int
Burst to use while communicating with the kubernetes apiserver. Defaults to 10. (default 10)
-kube-api-qps float
QPS to use while communicating with the kubernetes apiserver. Defaults to 5.0. (default 5)
-kubeconfig string
Absolute path to the kubeconfig file. Required only when running out of cluster.
-leader-election
Enables leader election.
-leader-election-lease-duration duration
Duration, in seconds, that non-leader candidates will wait to force acquire leadership. Defaults to 15 seconds. (default 15s)
-leader-election-namespace string
The namespace where the leader election resource exists. Defaults to the pod namespace if not set.
-leader-election-renew-deadline duration
Duration, in seconds, that the acting leader will retry refreshing leadership before giving up. Defaults to 10 seconds. (default 10s)
-leader-election-retry-period duration
Duration, in seconds, the LeaderElector clients should wait between tries of actions. Defaults to 5 seconds. (default 5s)
-log_backtrace_at value
when logging hits line file:N, emit a stack trace
-log_dir string
If non-empty, write log files in this directory (no effect when -logtostderr=true)
-log_file string
If non-empty, use this log file (no effect when -logtostderr=true)
-log_file_max_size uint
Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
@DeprecatedLuke
Copy link
Author

DeprecatedLuke commented Dec 28, 2024

Appears to be fixed in 477b7fd which is part of the v1.16.0 release, but the helm chart does not include such fix even though the version is docker.io/rook/ceph:v1.16.0.

@DeprecatedLuke
Copy link
Author

Tracked it down to an transparent helm upgrade conflict.

@DeprecatedLuke
Copy link
Author

DeprecatedLuke commented Dec 28, 2024

v1.16 Helm Chart should use v8.2.0 as v8.0.1 does not have --feature-gates

  snapshotter:
    # -- Kubernetes CSI snapshotter image repository
    repository: registry.k8s.io/sig-storage/csi-snapshotter
    # -- Snapshotter image tag
    tag: v8.2.0

@warstrolo
Copy link

For what I have seen and my latest installation, this is included in the latest chart, the problem to the issue does not reside in the chart but in the operator code.

The operator does not check the version of CSI snapshoter that is deployed against the params it gives.

@DeprecatedLuke
Copy link
Author

For what I have seen and my latest installation, this is included in the latest chart, the problem to the issue does not reside in the chart but in the operator code.

The operator does not check the version of CSI snapshoter that is deployed against the params it gives.

Actually the operator helm chart just has the v8.0.1 version.

@DeprecatedLuke
Copy link
Author

Appears to be fixed in later v1.16.0 releases. Pulled an older version of v1.16.0?

@travisn
Copy link
Member

travisn commented Jan 2, 2025

This was fixed by #15196, but has not been included in a release yet. See this comment for a workaround, and we will get the v1.16.1 release out very soon with this fix.

@djjudas21
Copy link

Hey, sorry to be the bearer of bad news, but I'm running into this on a greenfield cluster today with v1.16.1. Deployed via Helm, accepting the default image versions throughout.

Here are the versions I'm running:

$ kubectl describe po rook-ceph-operator-649b68d476-d5xkc | grep 'Image:'
    Image:         docker.io/rook/ceph:v1.16.1
$ kubectl describe po csi-cephfsplugin-provisioner | grep 'Image:' | sort -u
    Image:         quay.io/cephcsi/cephcsi:v3.13.0
    Image:         registry.k8s.io/sig-storage/csi-attacher:v4.6.1
    Image:         registry.k8s.io/sig-storage/csi-provisioner:v5.0.1
    Image:         registry.k8s.io/sig-storage/csi-resizer:v1.11.1
    Image:         registry.k8s.io/sig-storage/csi-snapshotter:v8.2.0
$ kubectl logs csi-cephfsplugin-provisioner-7999d48585-fxsxj csi-snapshotter
flag provided but not defined: -enable-volume-group-snapshots
Usage of /csi-snapshotter:
  -add_dir_header
    	If true, adds the file directory to the header of the log messages
  -alsologtostderr
    	log to standard error as well as files (no effect when -logtostderr=true)
  -csi-address string
    	Address of the CSI driver socket. (default "/run/csi/socket")
  -extra-create-metadata
    	If set, add snapshot metadata to plugin snapshot requests as parameters.
  -feature-gates value
    	Comma-seprated list of key=value pairs that describe feature gates for alpha/experimental features. Options are:
    	AllAlpha=true|false (ALPHA - default=false)
    	AllBeta=true|false (BETA - default=false)
    	CSIVolumeGroupSnapshot=true|false (BETA - default=false)
  -groupsnapshot-name-prefix string
    	Prefix to apply to the name of a created group snapshot (default "groupsnapshot")
  -groupsnapshot-name-uuid-length int
    	Length in characters for the generated uuid of a created group snapshot. Defaults behavior is to NOT truncate. (default -1)
  -http-endpoint :8080
    	The TCP network address where the HTTP server for diagnostics, including metrics and leader election health check, will listen (example: :8080). The default is empty string, which means the server is disabled. Only one of `--metrics-address` and `--http-endpoint` can be set.
  -kube-api-burst int
    	Burst to use while communicating with the kubernetes apiserver. Defaults to 10. (default 10)
  -kube-api-qps float
    	QPS to use while communicating with the kubernetes apiserver. Defaults to 5.0. (default 5)
  -kubeconfig string
    	Absolute path to the kubeconfig file. Required only when running out of cluster.
  -leader-election
    	Enables leader election.
  -leader-election-lease-duration duration
    	Duration, in seconds, that non-leader candidates will wait to force acquire leadership. Defaults to 15 seconds. (default 15s)
  -leader-election-namespace string
    	The namespace where the leader election resource exists. Defaults to the pod namespace if not set.
  -leader-election-renew-deadline duration
    	Duration, in seconds, that the acting leader will retry refreshing leadership before giving up. Defaults to 10 seconds. (default 10s)
  -leader-election-retry-period duration
    	Duration, in seconds, the LeaderElector clients should wait between tries of actions. Defaults to 5 seconds. (default 5s)
  -log_backtrace_at value
    	when logging hits line file:N, emit a stack trace
  -log_dir string
    	If non-empty, write log files in this directory (no effect when -logtostderr=true)
  -log_file string
    	If non-empty, use this log file (no effect when -logtostderr=true)
  -log_file_max_size uint
    	Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
  -logtostderr
    	log to standard error instead of files (default true)
  -metrics-address :8080
    	(deprecated) The TCP network address where the prometheus metrics endpoint will listen (example: :8080). The default is empty string, which means metrics endpoint is disabled. Only one of `--metrics-address` and `--http-endpoint` can be set.
  -metrics-path /metrics
    	The HTTP path where prometheus metrics will be exposed. Default is /metrics. (default "/metrics")
  -node-deployment
    	Enables deploying the sidecar controller together with a CSI driver on nodes to manage snapshots for node-local volumes.
  -one_output
    	If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)
  -resync-period duration
    	Resync interval of the controller. Default is 15 minutes (default 15m0s)
  -retry-interval-max duration
    	Maximum retry interval of failed volume snapshot creation or deletion. Default is 5 minutes. (default 5m0s)
  -retry-interval-start duration
    	Initial retry interval of failed volume snapshot creation or deletion. It doubles with each failure, up to retry-interval-max. Default is 1 second. (default 1s)
  -skip_headers
    	If true, avoid header prefixes in the log messages
  -skip_log_headers
    	If true, avoid headers when opening log files (no effect when -logtostderr=true)
  -snapshot-name-prefix string
    	Prefix to apply to the name of a created snapshot (default "snapshot")
  -snapshot-name-uuid-length int
    	Length in characters for the generated uuid of a created snapshot. Defaults behavior is to NOT truncate. (default -1)
  -stderrthreshold value
    	logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=true) (default 2)
  -timeout duration
    	The timeout for any RPCs to the CSI driver. Default is 1 minute. (default 1m0s)
  -v value
    	number for the log level verbosity
  -version
    	Show version.
  -vmodule value
    	comma-separated list of pattern=N settings for file-filtered logging
  -worker-threads int
    	Number of worker threads. (default 10)

@djjudas21
Copy link

Hmm I'm on RKE2 so it looks like #15234 (comment) is relevant. Downgrading to v1.16.0 seems to work around it for now.

@DeprecatedLuke
Copy link
Author

DeprecatedLuke commented Jan 3, 2025

You are using v1alpha1 snapshotter crds. Update to v1beta1.

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 6, 2025

Alpha CRD's are meant only for testing purpose as most of us know, we should not be using it in any production clusters as they are not suppose to be backward compatible and can break anything without any upgrade support. am not sure why RKE2 is installing the alpha CRD by default, I hope there will be an option to disable it. can someone check with RKE helm charts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants