Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RDMA subsystem mode change #666

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

e0ne
Copy link
Collaborator

@e0ne e0ne commented Mar 24, 2024

Now it's possible to configure RDMA subsystem mode using SR-IOV Network Operator in systemd mode.

We can't configure RDMA subsystem in a daemon mode because it should be done on host before any network namespace is created.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Mar 24, 2024

Pull Request Test Coverage Report for Build 10772338186

Details

  • 53 of 188 (28.19%) changed or added relevant lines in 11 files are covered.
  • 5 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.4%) to 44.668%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/host/internal/lib/netlink/netlink.go 0 3 0.0%
pkg/daemon/writer.go 0 6 0.0%
controllers/sriovnetworknodepolicy_controller.go 0 7 0.0%
pkg/daemon/daemon.go 1 8 12.5%
api/v1/zz_generated.deepcopy.go 2 12 16.67%
pkg/host/internal/lib/netlink/mock/mock_netlink.go 0 11 0.0%
pkg/helper/mock/mock_helper.go 0 21 0.0%
pkg/host/mock/mock_host.go 0 21 0.0%
pkg/host/internal/network/network.go 0 24 0.0%
controllers/helper.go 49 74 66.22%
Files with Coverage Reduction New Missed Lines %
controllers/generic_network_controller.go 5 74.53%
Totals Coverage Status
Change from base Build 10734670028: -0.4%
Covered Lines: 6606
Relevant Lines: 14789

💛 - Coveralls

newFile := false
// remove the device plugin revision as we don't need it here
newState.Spec.DpConfigVersion = ""

// shared mode is a default on OS
rdmaMode := consts.RdmaSubsystemModeShared
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to query/change mode only in case if rdmaMode parameter is explicitly set in the poolConfig, to provide a safer behavior for ENVs which doesn't use RDMA.

@@ -152,6 +152,17 @@ func phasePre(setupLog logr.Logger, conf *systemd.SriovConfig, hostHelpers helpe
hostHelpers.TryEnableTun()
hostHelpers.TryEnableVhostNet()

rdmaSubsystem, err := hostHelpers.GetRDMASubsystem()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should execute this logic only if mode configuration is explicitly requested by a user.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// +kubebuilder:validation:Enum=shared;exclusive
// RDMA subsystem. Allowed value "shared", "exclusive".
RdmaMode string `json:"rdmaMode,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is only valid for systemd mode?
Do we want to document this somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done as log message in a SriovNetworkPoolConfig controller

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@ykulazhenkov ykulazhenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added few additional comments

if conf.RdmaMode != "" {
rdmaSubsystem, err := hostHelpers.GetRDMASubsystem()
if err != nil {
setupLog.Error(err, "failed to get RDMA subsystem mode")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If conf.RdmaMode is not empty string, then the user explicitly requested RDMA mode configuration. I think we can return error in this case. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if rdmaSubsystem != conf.RdmaMode {
err = hostHelpers.SetRDMASubsystem(conf.RdmaMode)
if err != nil {
setupLog.Error(err, "failed to set RDMA subsystem mode")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to return error here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want, thanks!

@@ -522,6 +522,34 @@ func (k *kernel) InstallRDMA(packageManager string) error {
return nil
}

func (k *kernel) GetRDMASubsystem() (string, error) {
log.Log.Info("GetRDMASubsystem(): retrieving RDMA subsystem mode")
chrootDefinition := utils.GetChrootExtension()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have helper to enter chroot (part of utilsHelper). Do we want to use it here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd got the same implementation in all `kernel' methods. Let's do it in a scope of a separate PR

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

github-actions bot commented Apr 3, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne
Copy link
Collaborator Author

e0ne commented Apr 15, 2024

@SchSeba could you please review this PR?

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work getting close to be ready :)

@@ -23,9 +23,14 @@ import (
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.

type System struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please move this one below the SriovNetworkNodeStateSpec struct

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -152,6 +153,21 @@ func phasePre(setupLog logr.Logger, conf *systemd.SriovConfig, hostHelpers helpe
hostHelpers.TryEnableTun()
hostHelpers.TryEnableVhostNet()

if conf.Spec.System.RdmaMode != "" {
rdmaSubsystem, err := netlink.RdmaSystemGetNetnsMode()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the hostHelper interface don't call the netlink lib directly so it will be easier to add unit-tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return fmt.Errorf("failed to get RDMA subsystem mode: %v", err)
}
if rdmaSubsystem != conf.Spec.System.RdmaMode {
err = netlink.RdmaSystemSetNetnsMode(conf.Spec.System.RdmaMode)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the hostHelper interface don't call the netlink lib directly so it will be easier to add unit-tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -58,3 +58,6 @@ rules:
- apiGroups: [ "config.openshift.io" ]
resources: [ "infrastructures" ]
verbs: [ "get", "list", "watch" ]
- apiGroups: [ "sriovnetwork.openshift.io" ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this one as the daemon will continue to only see the nodeState

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -127,7 +128,13 @@ func (w *NodeStateStatusWriter) pollNicStatus() error {
if err != nil {
return err
}
rdmaMode, err = w.hostHelper.GetRDMASubsystem()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please lets have a nice function in hostHelper like DiscoverSriovDevices to DiscoverSystemConfig() or something like that it will return System struct

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to DiscoverRDMASubsystem. It will always return a string so there is no need to introduce a new type

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@adrianchiris
Copy link
Collaborator

@e0ne can you rebase the pr ?

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

adrianchiris
adrianchiris previously approved these changes Jun 25, 2024
@@ -161,3 +171,94 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli

return AnnotateObject(ctx, node, key, value, c)
}

func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add docstring

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also im thinking we should have two functions:

  1. find node pool for node
  2. find nodes for node pool (with special handling for case where default node pool was provided)

WDYT ?

Also please add UT for whatever we end up with

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -26,6 +28,14 @@ const (
controlPlaneNodeLabelKey = "node-role.kubernetes.io/control-plane"
)

var (
oneNode = intstr.FromInt32(1)
defaultNpcl = &sriovnetworkv1.SriovNetworkPoolConfig{Spec: sriovnetworkv1.SriovNetworkPoolConfigSpec{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use full name here ? e.g defaultPoolConfig ?

also the 'l' at the end is not related

return nil, nil, err
}

// list all the nodes that are also part of this pool and return them
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for those nodes why arent we validating they match exactly one ncp ? like in L223

@@ -420,6 +421,13 @@ func (dn *Daemon) nodeStateSyncHandler() error {
// When using systemd configuration we write the file
if vars.UsingSystemdMode {
log.Log.V(0).Info("nodeStateSyncHandler(): writing systemd config file to host")
// get node object
node := &corev1.Node{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont see node is being used in this scope.

@@ -92,6 +92,23 @@ spec:
mountPath: /host/etc/os-release
readOnly: true
{{- end }}
{{- if .RDMACNIImage }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi please rebase this PR now that we merged the rdma-cni deployment

@@ -152,6 +152,21 @@ func phasePre(setupLog logr.Logger, conf *systemd.SriovConfig, hostHelpers helpe
hostHelpers.TryEnableTun()
hostHelpers.TryEnableVhostNet()

if conf.Spec.System.RdmaMode != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this one as we do the configure via the modeprobe file

@@ -114,10 +115,15 @@ type OVSUplinkConfigExt struct {
Interface OVSInterfaceConfig `json:"interface,omitempty"`
}

type System struct {
RdmaMode string `json:"rdmaMode,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can add here also

// +kubebuilder:validation:Enum=shared;exclusive
// RDMA subsystem. Allowed value "shared", "exclusive".

@@ -269,6 +270,13 @@ func (r *SriovNetworkNodePolicyReconciler) syncAllSriovNetworkNodeStates(ctx con
ns.Name = node.Name
ns.Namespace = vars.Namespace
j, _ := json.Marshal(ns)
netPoolConfig, _, err := utils.FindNodePoolConfig(context.Background(), &node, r.Client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the context from the function don't create a new one

@@ -73,6 +73,19 @@ func (r *SriovNetworkPoolConfigReconciler) Reconcile(ctx context.Context, req ct
return reconcile.Result{}, err
}

// RdmaMode could be set in systemd mode only
if instance.Spec.RdmaMode != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this one as we support this on both modes

@@ -522,6 +523,29 @@ func (k *kernel) InstallRDMA(packageManager string) error {
return nil
}

func (k *kernel) DiscoverRDMASubsystem() (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move this function to the network or sriov package

@@ -522,6 +523,29 @@ func (k *kernel) InstallRDMA(packageManager string) error {
return nil
}

func (k *kernel) DiscoverRDMASubsystem() (string, error) {
log.Log.Info("DiscoverRDMASubsystem(): retrieving RDMA subsystem mode")
subsystem, err := netlink.RdmaSystemGetNetnsMode()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the netlink interface in the project so we can have a mock for it on unit tests

return subsystem, nil
}

func (k *kernel) SetRDMASubsystem(mode string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is no needed now that we use the modprobe file

@@ -161,3 +171,94 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli

return AnnotateObject(ctx, node, key, value, c)
}

func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 27, 2024

Hi @e0ne can you please rebase the PR?

@e0ne
Copy link
Collaborator Author

e0ne commented Aug 27, 2024

Hi @e0ne can you please rebase the PR?

done

@e0ne e0ne force-pushed the rdma-subsytem-mode branch 2 times, most recently from e294415 to 288e028 Compare August 28, 2024 08:34
Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work!

I left some small comments

}
return defaultNpcl, defaultNodeLists, nil
}
return utils.FindNodePoolConfig(ctx, node, dr.Client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put this in the helper of the controllers?
I don't want to utils to start growing again

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense. done

@@ -272,6 +273,13 @@ func (r *SriovNetworkNodePolicyReconciler) syncAllSriovNetworkNodeStates(ctx con
ns.Name = node.Name
ns.Namespace = vars.Namespace
j, _ := json.Marshal(ns)
netPoolConfig, _, err := utils.FindNodePoolConfig(ctx, &node, r.Client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a general todo here we should have in memory map I think for this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please elaborate on this?

@@ -23,6 +24,8 @@ import (
"github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/vars"
)

var ManifestsPath = "./bindata/manifests"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets put this in consts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed anymore, so I deleted it

if mode == "exclusive" {
modeValue = 0
}
config := fmt.Sprintf("options ib_core netns_mode=%d\n", modeValue)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you use this we can remove the other file in the manifests no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's deleted now

modeValue = 0
}
config := fmt.Sprintf("options ib_core netns_mode=%d\n", modeValue)
err := os.WriteFile("/etc/modprobe.d/ib_core.conf", []byte(config), 0644)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the getExtention here so we know if we are not inside a chroot

return fmt.Errorf("failed to write ib_core config: %v", err)
}

err = os.WriteFile(path.Join(consts.Chroot, "/etc/modprobe.d/ib_core.conf"), []byte(config), 0644)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a duplicate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase issue, it's deleted now

@@ -77,6 +77,14 @@ func RenderDir(manifestDir string, d *RenderData) ([]*unstructured.Unstructured,
return out, nil
}

func RenderToString(path string, d *RenderData) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this function where we use it?

@@ -161,3 +171,94 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli

return AnnotateObject(ctx, node, key, value, c)
}

func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this function to the helpers in controllers better then adding more stuff to utils

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants