Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only configure IPv6 RAs when ipv6 mode enabled #4810

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

trozet
Copy link
Contributor

@trozet trozet commented Oct 28, 2024

Without this fix, ovn-controller would spam:
2024-10-28T16:58:44.283Z|04438|pinctrl|WARN|Invalid IPv6 prefixes:

@trozet trozet requested a review from a team as a code owner October 28, 2024 20:25
@trozet trozet requested a review from girishmg October 28, 2024 20:25
tssurya
tssurya previously approved these changes Oct 28, 2024
@@ -403,7 +403,7 @@ func (gw *GatewayManager) GatewayInit(
types.NetworkExternalID: gw.netInfo.GetNetworkName(),
types.TopologyExternalID: gw.netInfo.TopologyType(),
}
if gw.netInfo.IsPrimaryNetwork() && gw.netInfo.TopologyType() == types.Layer2Topology {
if gw.netInfo.IsPrimaryNetwork() && gw.netInfo.TopologyType() == types.Layer2Topology && config.IPv6Mode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

I pray some unit test somewhere screams, I'll be sad otherwise.... cause that means we are not testing single stack also in UDN UTs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any tests added in https://github.com/ovn-org/ovn-kubernetes/pull/4750/files#diff-bc68463ae093f3bc3451040d0aefa884f981e4d4cee8583948af24a24e2b0b9b

I don't see any tests for UDN in gateway.go. We probably want to add some to ensure it doesn't regress in the future. @qinqon fyi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke too soon! UTs failing, its checked in a different place:

2024-10-28T20:47:32.0392453Z �[38;5;9m�[1mSummarizing 4 Failures:�[0m
2024-10-28T20:47:32.0394836Z   �[38;5;9m[FAIL]�[0m �[0mOVN Multi-Homed pod operations for layer2 network �[38;5;243mreconciles a new �[38;5;9m�[1m[It] pod on a user defined primary network�[0m
2024-10-28T20:47:32.0397915Z   �[38;5;243m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/secondary_layer2_network_controller_test.go:161�[0m
2024-10-28T20:47:32.0400471Z   �[38;5;9m[FAIL]�[0m �[0mOVN Multi-Homed pod operations for layer2 network �[38;5;243mreconciles a new �[38;5;9m�[1m[It] pod on a user defined primary network on an IC cluster�[0m
2024-10-28T20:47:32.0402407Z   �[38;5;243m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/secondary_layer2_network_controller_test.go:161�[0m
2024-10-28T20:47:32.0403952Z   �[38;5;9m[FAIL]�[0m �[0mOVN Multi-Homed pod operations for layer2 network �[38;5;243mreconciles a new �[38;5;9m�[1m[It] pod on a user defined primary network on an IC cluster; LGW�[0m
2024-10-28T20:47:32.0405404Z   �[38;5;243m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/secondary_layer2_network_controller_test.go:161�[0m
2024-10-28T20:47:32.0407011Z   �[38;5;9m[FAIL]�[0m �[0mOVN Multi-Homed pod operations for layer2 network �[38;5;243mreconciles a new �[38;5;9m�[1m[It] pod on a user defined primary network on an IC cluster with per-pod SNATs enabled�[0m
2024-10-28T20:47:32.0408536Z   �[38;5;243m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/secondary_layer2_network_controller_test.go:161�[0m
2024-10-28T20:41:44.4461783Z               expected: <*nbdb.LogicalRouterPort | 0xc03030dc80>{
2024-10-28T20:41:44.4462500Z                   UUID: "rtos-isolatednet_ovn_layer2_switch-UUID",
2024-10-28T20:41:44.4462974Z                   DhcpRelay: nil,
2024-10-28T20:41:44.4463334Z                   Enabled: nil,
2024-10-28T20:41:44.4463699Z                   ExternalIDs: {
2024-10-28T20:41:44.4464218Z                       "k8s.ovn.org/topology": "layer2",
2024-10-28T20:41:44.4464818Z                       "k8s.ovn.org/network": "isolatednet",
2024-10-28T20:41:44.4465201Z                   },
2024-10-28T20:41:44.4465408Z                   GatewayChassis: nil,
2024-10-28T20:41:44.4465616Z                   HaChassisGroup: nil,
2024-10-28T20:41:44.4465905Z                   Ipv6Prefix: nil,
2024-10-28T20:41:44.4466095Z                   Ipv6RaConfigs: {
2024-10-28T20:41:44.4466426Z                       "address_mode": "dhcpv6_stateful",
2024-10-28T20:41:44.4466625Z                       "mtu": "1400",
2024-10-28T20:41:44.4466964Z                       "send_periodic": "true",
2024-10-28T20:41:44.4467216Z                       "max_interval": "900",
2024-10-28T20:41:44.4467462Z                       "min_interval": "300",
2024-10-28T20:41:44.4467761Z                       "router_preference": "LOW",
2024-10-28T20:41:44.4467992Z                   },

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases should pass now. There is no coverage for ipv6. I tried to add it but everything is hardcoded to ipv4 in these tests. Its going to take some time (which I dont have) to fix it. I added a FIXME and hopefully someone else can go fix the tests. For now at least the tests work correctly with IPv4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah no problem do you want to open an upstream issue to track the missing v6 tests maybe @maiqueb has some time since he knows these tests the best

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to improve IPv6 coverage in the following days.

@github-actions github-actions bot added the area/unit-testing Issues related to adding/updating unit tests label Oct 28, 2024
tssurya
tssurya previously approved these changes Oct 28, 2024
netName: secondaryNetworkName,
nadName: namespacedName(ns, nadName),
topology: ovntypes.Layer2Topology,
clustersubnet: subnets,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:why the rename change from clustersubnets to clustersubnet? the value still seems plural?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only takes one string value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the single string value is supposed to be like that from the NAD right?
we do:
"v4,v6" which is parsed based on the "," ? so it can take multiple subnets in the same string value which is why the variable name was plural

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this change is not needed. It sets us back in fact - once we enable the tests to work for dual stack, we'll need to put this back the way it originally was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pattern here is inconsistent. There are functions taking a plural that is a string. We shouldn't pass csv down to functions. If the intention is to pass cni conf as a CSV to a test thats one thing, but internally we should not just be using CSV in our variables. I'll change it back for now, but when multiple values are supported we need to change the type to be a slice or map.

netName string
nadName string
clustersubnet string
hostsubnet string // not used in layer2 tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too maybe its intentional cause you want to tell today tests are just supporting 1 family subnet?

@tssurya tssurya added the feature/kubevirt-live-migration All issues related to kubevirt live migration label Oct 28, 2024
@@ -403,7 +403,7 @@ func (gw *GatewayManager) GatewayInit(
types.NetworkExternalID: gw.netInfo.GetNetworkName(),
types.TopologyExternalID: gw.netInfo.TopologyType(),
}
if gw.netInfo.IsPrimaryNetwork() && gw.netInfo.TopologyType() == types.Layer2Topology {
if gw.netInfo.IsPrimaryNetwork() && gw.netInfo.TopologyType() == types.Layer2Topology && config.IPv6Mode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about single stack UDNs created on dualstack clusters?
Should we rely on gw.netInfo.IPMode() instead?

Copy link
Member

@tssurya tssurya Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if its a single stack UDN on a dualstack cluster I'd think we need the v6 options set cause the network is dualstack right? A user should be able to add a v6 subnet as a day2 config of converting the network from singlestack to dualstack if cluster supports it right? OR is that not a thing ? - I regret this thought because I dunno if that is even supported or supposed to be editable - the subnets field but it should be expandable like the default network right? So how exactly does dualstack cluster conversion work ? if we had older pods... would the restart create v6IPs for existing pods ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if its a single stack UDN on a dualstack cluster I'd think we need the v6 options set cause the network is dualstack right? A user should be able to add a v6 subnet as a day2 config of converting the network from singlestack to dualstack if cluster supports it right? OR is that not a thing ?

I disagree. The user defined the network as single-stack. And the UDN is immutable.

What the user can do is create another dual-stack immutable UDN egressing to the dual-stack network.

I agree w/ @kyrtapz that what's important is the NetInfo specification - as long as the underlying network support the user's intents - i.e. single stack UDN on a dual-stack cluster is OK; dual-stack UDN on a single stack cluster is not OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check both, that the UDN is ipv6 and ipv6 is allowed globally in ovnk. Does that work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm what I meant was is there any harm in setting these specific configs even if the NAD is singlestack....... but yeah taking both configs into account is the best probably needs a util in itself to be reusbale

netName: secondaryNetworkName,
nadName: namespacedName(ns, nadName),
topology: ovntypes.Layer2Topology,
clustersubnet: subnets,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this change is not needed. It sets us back in fact - once we enable the tests to work for dual stack, we'll need to put this back the way it originally was.

@@ -403,7 +403,7 @@ func (gw *GatewayManager) GatewayInit(
types.NetworkExternalID: gw.netInfo.GetNetworkName(),
types.TopologyExternalID: gw.netInfo.TopologyType(),
}
if gw.netInfo.IsPrimaryNetwork() && gw.netInfo.TopologyType() == types.Layer2Topology {
if gw.netInfo.IsPrimaryNetwork() && gw.netInfo.TopologyType() == types.Layer2Topology && config.IPv6Mode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if its a single stack UDN on a dualstack cluster I'd think we need the v6 options set cause the network is dualstack right? A user should be able to add a v6 subnet as a day2 config of converting the network from singlestack to dualstack if cluster supports it right? OR is that not a thing ?

I disagree. The user defined the network as single-stack. And the UDN is immutable.

What the user can do is create another dual-stack immutable UDN egressing to the dual-stack network.

I agree w/ @kyrtapz that what's important is the NetInfo specification - as long as the underlying network support the user's intents - i.e. single stack UDN on a dual-stack cluster is OK; dual-stack UDN on a single stack cluster is not OK.

Without this fix, ovn-controller would spam:
2024-10-28T16:58:44.283Z|04438|pinctrl|WARN|Invalid IPv6 prefixes:

Signed-off-by: Tim Rozet <trozet@redhat.com>
@trozet
Copy link
Contributor Author

trozet commented Oct 30, 2024

flake?

2024-10-29T20:33:13.5859902Z �[38;5;9m�[1mSummarizing 1 Failure:�[0m
2024-10-29T20:33:13.5861313Z   �[38;5;9m[FAIL]�[0m �[0mHybrid SDN Master Operations �[38;5;9m�[1m[It] handles a HO node is switched to a OVN node�[0m
2024-10-29T20:33:13.5863482Z   �[38;5;243m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/hybrid_test.go:1387�[0m


2024-10-29T20:30:02.7968250Z �[38;5;9m• [FAILED] [2.646 seconds]�[0m
2024-10-29T20:30:02.7969505Z �[0mHybrid SDN Master Operations �[38;5;9m�[1m[It] handles a HO node is switched to a OVN node�[0m
2024-10-29T20:30:02.7971241Z �[38;5;243m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/hybrid_test.go:1218�[0m
2024-10-29T20:30:02.7972179Z 
2024-10-29T20:30:02.7972533Z   �[38;5;9m[FAILED] Timed out after 2.001s.
2024-10-29T20:30:02.7973131Z   Expected
2024-10-29T20:30:02.7973798Z       <[]*nbdb.LogicalRouterStaticRoute | len:2, cap:2>: [
2024-10-29T20:30:02.7974486Z           {
2024-10-29T20:30:02.7975313Z               UUID: "c2069cdb-3084-485e-94c0-784e8746451c",
2024-10-29T20:30:02.7976255Z               BFD: nil,
2024-10-29T20:30:02.7976859Z               ExternalIDs: {
2024-10-29T20:30:02.7977888Z                   "name": "hybrid-subnet-node1:node-windows",
2024-10-29T20:30:02.7978545Z               },
2024-10-29T20:30:02.7979122Z               IPPrefix: "10.1.3.0/24",
2024-10-29T20:30:02.7979821Z               Nexthop: "10.1.1.3",
2024-10-29T20:30:02.7980442Z               Options: nil,
2024-10-29T20:30:02.7981042Z               OutputPort: nil,
2024-10-29T20:30:02.7981627Z               Policy: nil,
2024-10-29T20:30:02.7982219Z               RouteTable: "",
2024-10-29T20:30:02.7982722Z           },
2024-10-29T20:30:02.7983254Z           {
2024-10-29T20:30:02.7984159Z               UUID: "49d62bcb-cf7f-4d6a-8fb5-26cc4b76b8d3",
2024-10-29T20:30:02.7984892Z               BFD: nil,
2024-10-29T20:30:02.7985461Z               ExternalIDs: {
2024-10-29T20:30:02.7986372Z                   "name": "hybrid-subnet-node1-gr",
2024-10-29T20:30:02.7987020Z               },
2024-10-29T20:30:02.7987589Z               IPPrefix: "10.1.3.0/24",
2024-10-29T20:30:02.7988301Z               Nexthop: "100.64.0.1",
2024-10-29T20:30:02.7988939Z               Options: nil,
2024-10-29T20:30:02.7989542Z               OutputPort: nil,
2024-10-29T20:30:02.7990141Z               Policy: nil,
2024-10-29T20:30:02.7990878Z               RouteTable: "",
2024-10-29T20:30:02.7991376Z           },
2024-10-29T20:30:02.7991875Z       ]
2024-10-29T20:30:02.7992324Z   to have length 0�[0m

Some logs around the failure:

2024-10-29T20:30:00.9272643Z I1029 20:30:00.820258   37573 obj_retry.go:235] Iterate retry objects already requested (resource *v1.Pod)
2024-10-29T20:30:00.9275273Z E1029 20:30:00.820332   37573 obj_retry.go:671] Failed to update *v1.Node, old=node-windows, new=node-windows, error: [macAddress annotation not found for node node-windows; error: could not find "k8s.ovn.org/node-mgmt-port-mac-addresses" annotation, failed to set up hybrid overlay logical switch port for node-windows: cannot set up hybrid overlay ports, distributed router ip is nil, k8s.ovn.org/l3-gateway-config annotation not found for node "node-windows"]
2024-10-29T20:30:00.9277905Z I1029 20:30:00.820388   37573 default_network_controller.go:705] Recording error event for node node-windows
2024-10-29T20:30:00.9278771Z I1029 20:30:00.820419   37573 ovn.go:104] Posting a Warning event for node node-windows
2024-10-29T20:30:00.9279630Z I1029 20:30:00.820483   37573 obj_retry.go:555] Update event received for resource *v1.Node, old object is equal to new: false
2024-10-29T20:30:00.9280576Z I1029 20:30:00.820525   37573 obj_retry.go:607] Update event received for *v1.Node node-windows
2024-10-29T20:30:00.9281345Z I1029 20:30:00.820642   37573 master.go:550] Adding or Updating Node "node-windows"
2024-10-29T20:30:00.9282527Z I1029 20:30:00.820888   37573 base_network_controller.go:492] When adding node node-windows for network default, found 0 pods to add to retryPods
2024-10-29T20:30:00.9283547Z I1029 20:30:00.820923   37573 obj_retry.go:235] Iterate retry objects already requested (resource *v1.Pod)
2024-10-29T20:30:00.9286217Z E1029 20:30:00.820955   37573 obj_retry.go:671] Failed to update *v1.Node, old=node-windows, new=node-windows, error: [macAddress annotation not found for node node-windows; error: could not find "k8s.ovn.org/node-mgmt-port-mac-addresses" annotation, failed to set up hybrid overlay logical switch port for node-windows: cannot set up hybrid overlay ports, distributed router ip is nil, k8s.ovn.org/l3-gateway-config annotation not found for node "node-windows"]

https://github.com/ovn-org/ovn-kubernetes/actions/runs/11581677910/job/32242971970?pr=4810

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/unit-testing Issues related to adding/updating unit tests feature/kubevirt-live-migration All issues related to kubevirt live migration
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

4 participants