🌱 improve getNodeRefMap #11071

cahillsf · 2024-08-19T23:23:56Z

What this PR does / why we need it:
improves efficiency for how the MP controller retrieves nodes

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #8856

/area machinepool

k8s-ci-robot · 2024-08-19T23:24:01Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vincepri for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cahillsf · 2024-08-19T23:26:00Z

not sure if we want to incorporate this check for more than one node matching a providerID:

cluster-api/internal/controllers/machine/machine_controller_noderef.go

Lines 223 to 225 in f62294e

    
           if len(nodeList.Items) != 1 { 
        
           	return nil, fmt.Errorf("unexpectedly found more than one Node matching the providerID %s", providerID) 
        
           }

the full list approach does not catch this since it's just looping over the full list and adding entries in the map, so it would just be "last seen wins"

cahillsf · 2024-08-20T20:28:36Z

/test pull-cluster-api-test-main

cahillsf · 2024-08-22T13:01:15Z

exp/internal/controllers/machinepool_controller_phases.go

+		}
+		nodeRefsMap[providerID] = &nodeList.Items[0]
+	}
+	if !completeList || len(providerIDList) == 0 {


we fall back to retrieving the whole list when len(providerIDList) == 0 because createOrUpdateMachines will pull the providerID from the infraMachines:

cluster-api/exp/internal/controllers/machinepool_controller_phases.go

Lines 417 to 424 in c62c864

var providerID string

var node *corev1.Node

if err := util.UnstructuredUnmarshalField(infraMachine, &providerID, "spec", "providerID"); err != nil {

log.V(4).Info("could not retrieve providerID for infraMachine", "infraMachine", klog.KObj(infraMachine))

} else {

// Retrieve the Node for the infraMachine from the nodeRefsMap using the providerID.

node = s.nodeRefMap[providerID]

}

prior to the providerID list in the machinePool.spec getting set from the infraConfig:

cluster-api/exp/internal/controllers/machinepool_controller_phases.go

Lines 327 to 328 in c62c864

if !reflect.DeepEqual(mp.Spec.ProviderIDList, providerIDList) {

mp.Spec.ProviderIDList = providerIDList

unit tests such as this one:

cluster-api/exp/internal/controllers/machinepool_controller_phases_test.go

Line 1958 in c62c864

t.Run("Should set `Running` when scaled from zero to one", func(t *testing.T) {

depend on this behavior

cahillsf · 2024-08-22T13:02:27Z

/test pull-cluster-api-e2e-main

looks like the same flake as on main: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api/11071/pull-cluster-api-e2e-main/1826605867118628864

cahillsf · 2024-08-22T19:12:00Z

/test pull-cluster-api-e2e-main

sbueringer · 2024-08-26T11:18:30Z

exp/internal/controllers/machinepool_controller_phases.go

-		if err := c.List(ctx, &nodeList, client.Continue(nodeList.Continue)); err != nil {
+	completeList := true
+	for _, providerID := range providerIDList {
+		if err := c.List(ctx, &nodeList, client.MatchingFields{index.NodeProviderIDField: providerID}); err != nil {


I'm wondering if it's really more efficient to do the list call for every single Node vs. doing one list call (e.g. are 100 Node List calls with field selector better then just 1 without?)

(not sure in which way it might matter, but let's consider that both cases are just hitting the local cache)

hm yeah i guess i took this for granted since the issue had been accepted and marked as help wanted

any thoughts on how i could go about proving out this efficiency improvement (or lack thereof 😄 )? or you think it's not worth the effort?

Yup sorry I didn't realize this on the issue. It's probably not worth the effort to be honest

sure thing, fine with me will close both the PR and the issue then 👍

Thx for working on it!!

k8s-ci-robot added the area/machinepool Issues or PRs related to machinepools label Aug 19, 2024

k8s-ci-robot requested review from fabriziopandini and sbueringer August 19, 2024 23:24

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 19, 2024

cahillsf force-pushed the improve-getnoderefmap branch 4 times, most recently from fb48a60 to 170411c Compare August 20, 2024 00:44

cahillsf marked this pull request as draft August 20, 2024 01:03

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 20, 2024

cahillsf force-pushed the improve-getnoderefmap branch from 170411c to fb71224 Compare August 20, 2024 16:59

cahillsf force-pushed the improve-getnoderefmap branch from fb71224 to fec41f2 Compare August 22, 2024 12:56

cahillsf commented Aug 22, 2024

View reviewed changes

cahillsf marked this pull request as ready for review August 22, 2024 13:01

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 22, 2024

k8s-ci-robot requested review from enxebre and jackfrancis August 22, 2024 13:01

improve getNodeRefMap

b68af11

cahillsf force-pushed the improve-getnoderefmap branch from fec41f2 to b68af11 Compare August 22, 2024 21:24

sbueringer reviewed Aug 26, 2024

View reviewed changes

cahillsf closed this Aug 26, 2024

cahillsf mentioned this pull request Aug 26, 2024

Consider using remote.NodeProviderIDIndex in MachinePool controller #8856

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌱 improve getNodeRefMap #11071

🌱 improve getNodeRefMap #11071

cahillsf commented Aug 19, 2024

k8s-ci-robot commented Aug 19, 2024

cahillsf commented Aug 19, 2024

cahillsf commented Aug 20, 2024

cahillsf Aug 22, 2024

cahillsf Aug 22, 2024

cahillsf commented Aug 22, 2024 •

edited

Loading

cahillsf commented Aug 22, 2024

sbueringer Aug 26, 2024 •

edited

Loading

cahillsf Aug 26, 2024

sbueringer Aug 26, 2024

cahillsf Aug 26, 2024

sbueringer Aug 26, 2024

	var providerID string
	var node *corev1.Node
	if err := util.UnstructuredUnmarshalField(infraMachine, &providerID, "spec", "providerID"); err != nil {
	log.V(4).Info("could not retrieve providerID for infraMachine", "infraMachine", klog.KObj(infraMachine))
	} else {
	// Retrieve the Node for the infraMachine from the nodeRefsMap using the providerID.
	node = s.nodeRefMap[providerID]
	}

	if !reflect.DeepEqual(mp.Spec.ProviderIDList, providerIDList) {
	mp.Spec.ProviderIDList = providerIDList

🌱 improve getNodeRefMap #11071

🌱 improve getNodeRefMap #11071

Conversation

cahillsf commented Aug 19, 2024

k8s-ci-robot commented Aug 19, 2024

cahillsf commented Aug 19, 2024

cahillsf commented Aug 20, 2024

cahillsf Aug 22, 2024

Choose a reason for hiding this comment

cahillsf Aug 22, 2024

Choose a reason for hiding this comment

cahillsf commented Aug 22, 2024 • edited Loading

cahillsf commented Aug 22, 2024

sbueringer Aug 26, 2024 • edited Loading

Choose a reason for hiding this comment

cahillsf Aug 26, 2024

Choose a reason for hiding this comment

sbueringer Aug 26, 2024

Choose a reason for hiding this comment

cahillsf Aug 26, 2024

Choose a reason for hiding this comment

sbueringer Aug 26, 2024

Choose a reason for hiding this comment

cahillsf commented Aug 22, 2024 •

edited

Loading

sbueringer Aug 26, 2024 •

edited

Loading