Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear behaviour of selector field in ExtendedDaemonSet/ExtendedDaemonSetReplicaSet #100

Open
syndicut opened this issue Jun 11, 2021 · 5 comments
Labels
bug Something isn't working component/controller

Comments

@syndicut
Copy link
Contributor

Describe the bug
ExtendedDaemonSet has selector field: https://github.com/DataDog/extendeddaemonset/blob/main/api/v1alpha1/extendeddaemonset_types.go#L22 , which states:

	// A label query over pods that are managed by the daemon set.
	// Must match in order to be controlled.
	// If empty, defaulted to labels on Pod template.
	// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors

This field is only used to initialize new ers:
https://github.com/DataDog/extendeddaemonset/blob/main/controllers/extendeddaemonset/controller.go#L528
In ers it is used to match with node (not pod) labels:
https://github.com/DataDog/extendeddaemonset/blob/main/controllers/extendeddaemonsetreplicaset/controller.go#L351
Not sure what the difference with specifying nodeSelector in spec.template.spec
It is also get printed in listing as "NODE SELECTOR":
https://github.com/DataDog/extendeddaemonset/blob/main/api/v1alpha1/extendeddaemonsetreplicaset_types.go#L118
Questions:

  1. Does comment of this field needs updating? It states a label query over pods, being query over nodes instead?
  2. What is the correct way to limit EDS pods to nodes with certain labels? (analog to spec.template.spec.nodeSelector in regular daemonset)

To Reproduce
Steps to reproduce the behavior:

  1. Add node query to selector field in EDS
  2. Pods are being scheduled on nodes using specified query

Expected behavior

  1. Selector is used for querying pods
  2. Nodes are queried using spec.template.spec.nodeSelector
  3. spec.template.spec.nodeSelector get printed as "NODE SELECTOR" in ers listing

Environment (please complete the following information):

  • Kubernetes version: 1.20.2
  • Kubernetes distribution: vanilla bare metal
@syndicut
Copy link
Contributor Author

@clamoriniere sorry for bothering, can you have a look at this issue?

@syndicut
Copy link
Contributor Author

syndicut commented Aug 6, 2021

It also looks like neither spec.template.spec.nodeSelector nor selector is considered when choosing canary nodes, so if you use labels to filter nodes to schedule EDS, you also explicitly need to put selector to spec.strategy.canary.nodeSelector:
https://github.com/DataDog/extendeddaemonset/blob/main/controllers/extendeddaemonset/controller.go#L306

@syndicut
Copy link
Contributor Author

@clamoriniere I think I can do a PR fixing this but I need to know am I getting expected behaviour right?

@clamoriniere
Copy link
Collaborator

Hello @syndicut,

thanks for your bug report, indeed something is not right with the usage of the spec.selector.

let me answer your second question first.

  1. What is the correct way to limit EDS pods to nodes with certain labels? (analog to spec.template.spec.nodeSelector in regular daemonset)

To deploy on specific not only, you have 3 standards solution in kubernetes:
1. Toleration if you have tainted your Node
2. NodeSelector: label selector on Node
3. Affinity with nodeAffinity that provide a more granular configuration than NodeSelector

For the 3 solutions it needs to be configured inside the PodTemplate ExtendedDaemonset.spec.template

  1. ExtendedDaemonset.spec.template.spec.toleration
  2. ExtendedDaemonset.spec.template.spec.nodeSelector
  3. ExtendedDaemonset.spec.template.spec.affinity.nodeAffinity

We are using the same logic than the native Daemonset. You can find the official documentation about NodeSelector and NodeAffinity here and for the toleration here


  1. Does comment of this field needs updating? It states a label query over pods, being query over nodes instead?

I think we added the ExtendedDaemonset.spec.selector to mimic the native Daemonset . So the actual description is correct.

But you are right, we should not use this label Selector to list Node here. IMO we need remove it.

Lucky us, we are not creating the ERS with it. so it prevent us from not selecting node that don't have the labels present in the selector. But is is also a bug, that save us from the first one.

You have also revealed a third bug about about the Node Canary selection, Indeed the way we select Node doesn't take into consideration the nodeSelector, toleration and nodeAffinity present in the podTemplate. It only use the nodeSelector and nodeAntiAffinityKeys present in the canary configuration. we should aggregate them before querying the api server to get the nodelist.

Fixes to make

  • remove the usage of ers.spec.selector to list Node in ERS controller.
  • copy from the eds to the ers the eds.spec.selector.
  • Improve how we select canary nodes by using the PodTemplate.spec.nodeSelector PodTemplate.spec.affinity.nodeAffinity and PodTemplate.spec.Toleration of the canary ERS.spec.template. (we already have a function that can be used to check if a Pod can be schedule on a specific Node)

I think you have properly, discover and understood the issues.
Feel free to open one or several pull requests that fix them knowing that the 2 first bullets need to be fixed together. But the last one about canary node selection can be done separately.

Please let us know if you plan to work on it and thanks again for this great investigation 👍

@clamoriniere clamoriniere added bug Something isn't working component/controller labels Sep 24, 2021
@syndicut
Copy link
Contributor Author

Thanks for clarification, I plan to work on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/controller
Projects
None yet
Development

No branches or pull requests

2 participants