Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP for flagz page for Kubernetes Components #4831

Merged
merged 1 commit into from
Oct 9, 2024

Conversation

richabanker
Copy link
Contributor

@richabanker richabanker commented Sep 6, 2024

  • One-line PR description: Add a KEP for flagz page in Kubernetes components.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 6, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 6, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 6, 2024
@richabanker richabanker force-pushed the component-flagz branch 3 times, most recently from 75f707c to 6a00d59 Compare September 13, 2024 21:28
@dgrisonnet
Copy link
Member

/assign


### Data Format and versioning

Initially, the flagz page will exclusively support a plain text format for responses. We will implement these endpoints using versioned URLs, like /v1/flagz, to ensure future compatibility. This versioning strategy allows us to seamlessly introduce structured data formats (e.g., JSON) in the future, through a distinct endpoint such as /v2/flagz, without disrupting existing implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to have the endpoints in the form of /flagz/v1? Flagz is the subdomain that we want to version, if we were to put the version first, that would be confusing with the component version as it would then be at the top-level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right, I got that the other way round. Fixed it now, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I was wondering since then is whether going with a query parameter might not be better. Something like:

/flagz?version=1
/flags?version=2

wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also raised in the statusz KEP PR. I am ok with either, whichever seems the best to indicate the version for the endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the KEP to use query params for specifying the version.

that might indicate a serious problem?
-->

- apiserver_request_duration_seconds metric that will tell us if there's a spike in request latency of kube-apiserver that might indicate that the flagz endpoint is interfering with the component's core functionality or causing delays in processing other requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC you won't get this metric for free, you'll have to make sure that the handler for the flagz endpoint is instrumented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm unsure about how flagz handler relates with this metric which I thought is enabled by default? The idea was to use this metric to monitor general load of requests that apiserver handles and use that as a signal to check if flagz is consuming all system resources causing delays in apiserver's capability to serve other resource requests..

though yeah I guess that wouldnt be a clean signal to detect issues with the flagz endpoint in a deterministic way.. Do you suggest introducing a new metric that keeps track of request latency for just /flagz requests ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to submit my comment on your PR, but you already dealt with that concern of mine in your POC: kubernetes/kubernetes#127581 (comment)

@richabanker
Copy link
Contributor Author

cc @johnbelamaric for PRR. Hi John, could you please review this KEP for prod-readiness whenever you get a chance? Thanks a lot!

@richabanker
Copy link
Contributor Author

/assign @johnbelamaric

whoops meant to add as assignee

@johnbelamaric
Copy link
Member

/assign


1. **No sensitive data exposed**

We will ensure that no sensitive data is exposed through flagz and that access to this endpoint is gated by using system-monitoring group.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how? Do flag definitions allow us to express the idea that the flag is "sensitive" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think there currently is support for marking flags like that. So we would need to introduce a way to express this.
cc @cjcullen who I just reached out to for his thoughts on the same. Also @liggitt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to push that up to pflag? One could argue that falgs should NEVER have sensitive information, since they are visible via ps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to push that up to pflag?

Or.. maybe we could create a separate set of "sensitive flags" and while printing out the values in the flagz handler, redact the values for those flags?

One could argue that falgs should NEVER have sensitive information, since they are visible via ps

That's indeed true. But if we still feel that exposing this info in an endpoint will be easier for attackers to get hold of, we can impose some restrictions on what data to expose in the endpoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any examples of these sorts of flags?

Copy link
Contributor Author

@richabanker richabanker Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the TLS related flags.. (also waiting for someone from sig-auth to help answer that..)

Copy link
Member

@liggitt liggitt Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware any flags in long-lived components (kube-apiserver, kubelet, scheduler, controller-manager, etc) that contain sensitive information directly. For things like TLS or credentials, they always point at files which contain the data.

There are shorter-lived invocations that take flags containing sensitive information (like kubeadm init with --certificate-key or --token), and components built that bind client-go user flags (like kubectl) can specify a token credential on the command-line using --token. Both of those have the option to consume those values from a file as well.

pflag does have the ability to annotate flags, which we apparently use like this to mark some flags as "classified":

// AddSecretAnnotation add secret flag to Annotation.
func (f FlagInfo) AddSecretAnnotation(flags *pflag.FlagSet) FlagInfo {
	flags.SetAnnotation(f.LongName, "classified", []string{"true"})
	return f
}

and then filter out here when building an annotation to include in API objects when the client chooses to record the change cause in an annotation (--record)

	parseFunc := func(flag *pflag.Flag, value string) error {
		flags = flags + " --" + flag.Name
		if set, ok := flag.Annotations["classified"]; !ok || len(set) == 0 {
			flags = flags + "=" + value
		} else {
			flags = flags + "=CLASSIFIED"
		}
		return nil
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks for pointing out the pflag annotation feature. That's perfect for displaying the non-CLASSIFIED flags.


#### Request
* Method: **GET**
* Endpoint: **v1/flagz**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be updated once we've agreed on the versioning format we want to follow

keps/sig-instrumentation/4828-component-flagz/README.md Outdated Show resolved Hide resolved
Copy link

@mattbailey mattbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shadow prod readiness review

Just some nits/clerical (checkboxes), otherwise PRR looks good.

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor points on PRR, but looks OK to me.

keps/sig-instrumentation/4828-component-flagz/README.md Outdated Show resolved Hide resolved
keps/sig-instrumentation/4828-component-flagz/README.md Outdated Show resolved Hide resolved
keps/sig-instrumentation/4828-component-flagz/README.md Outdated Show resolved Hide resolved
keps/sig-instrumentation/4828-component-flagz/README.md Outdated Show resolved Hide resolved
keps/sig-instrumentation/4828-component-flagz/README.md Outdated Show resolved Hide resolved
@johnbelamaric
Copy link
Member

PRR looks good, I will wait for sig approval to use the magic word

@richabanker richabanker force-pushed the component-flagz branch 5 times, most recently from 98f4f05 to c6c6e53 Compare October 9, 2024 15:38
@dgrisonnet
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2024
@johnbelamaric
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, johnbelamaric, richabanker

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 9, 2024
@k8s-ci-robot k8s-ci-robot merged commit 6bdbc9a into kubernetes:master Oct 9, 2024
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants