Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 improve getNodeRefMap #11071

Closed

Conversation

cahillsf
Copy link
Member

What this PR does / why we need it:
improves efficiency for how the MP controller retrieves nodes

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #8856

/area machinepool

@k8s-ci-robot k8s-ci-robot added the area/machinepool Issues or PRs related to machinepools label Aug 19, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vincepri for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 19, 2024
@cahillsf
Copy link
Member Author

not sure if we want to incorporate this check for more than one node matching a providerID:

if len(nodeList.Items) != 1 {
return nil, fmt.Errorf("unexpectedly found more than one Node matching the providerID %s", providerID)
}

the full list approach does not catch this since it's just looping over the full list and adding entries in the map, so it would just be "last seen wins"

@cahillsf cahillsf force-pushed the improve-getnoderefmap branch 4 times, most recently from fb48a60 to 170411c Compare August 20, 2024 00:44
@cahillsf cahillsf marked this pull request as draft August 20, 2024 01:03
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 20, 2024
@cahillsf
Copy link
Member Author

/test pull-cluster-api-test-main

}
nodeRefsMap[providerID] = &nodeList.Items[0]
}
if !completeList || len(providerIDList) == 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we fall back to retrieving the whole list when len(providerIDList) == 0 because createOrUpdateMachines will pull the providerID from the infraMachines:

var providerID string
var node *corev1.Node
if err := util.UnstructuredUnmarshalField(infraMachine, &providerID, "spec", "providerID"); err != nil {
log.V(4).Info("could not retrieve providerID for infraMachine", "infraMachine", klog.KObj(infraMachine))
} else {
// Retrieve the Node for the infraMachine from the nodeRefsMap using the providerID.
node = s.nodeRefMap[providerID]
}

prior to the providerID list in the machinePool.spec getting set from the infraConfig:

if !reflect.DeepEqual(mp.Spec.ProviderIDList, providerIDList) {
mp.Spec.ProviderIDList = providerIDList

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit tests such as this one:

t.Run("Should set `Running` when scaled from zero to one", func(t *testing.T) {

depend on this behavior

@cahillsf cahillsf marked this pull request as ready for review August 22, 2024 13:01
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 22, 2024
@cahillsf
Copy link
Member Author

cahillsf commented Aug 22, 2024

@cahillsf
Copy link
Member Author

/test pull-cluster-api-e2e-main

if err := c.List(ctx, &nodeList, client.Continue(nodeList.Continue)); err != nil {
completeList := true
for _, providerID := range providerIDList {
if err := c.List(ctx, &nodeList, client.MatchingFields{index.NodeProviderIDField: providerID}); err != nil {
Copy link
Member

@sbueringer sbueringer Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it's really more efficient to do the list call for every single Node vs. doing one list call (e.g. are 100 Node List calls with field selector better then just 1 without?)

(not sure in which way it might matter, but let's consider that both cases are just hitting the local cache)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm yeah i guess i took this for granted since the issue had been accepted and marked as help wanted

any thoughts on how i could go about proving out this efficiency improvement (or lack thereof 😄 )? or you think it's not worth the effort?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup sorry I didn't realize this on the issue. It's probably not worth the effort to be honest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing, fine with me will close both the PR and the issue then 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for working on it!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/machinepool Issues or PRs related to machinepools cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider using remote.NodeProviderIDIndex in MachinePool controller
3 participants