Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cilium Cluster Scope IPAM melts DRBD storage fabric #705

Open
1 task done
bernardgut opened this issue Sep 7, 2024 · 3 comments
Open
1 task done

Cilium Cluster Scope IPAM melts DRBD storage fabric #705

bernardgut opened this issue Sep 7, 2024 · 3 comments

Comments

@bernardgut
Copy link

bernardgut commented Sep 7, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Version

Cilium equal or higher than v1.16.0 and lower than v1.17.0
Piraeus Operator: v2.6.0

What happened?

Cross posting cilium/cilium#34745 here, because maybe it is a DRBD config issue more than a Cilium issue.

Anyway, when I create cluster with ipam.mode as default (not set) my interfaces on my nodes get assigned these random IPs in 10.0.0.0/8. Another cluster exists on the same subnet (and that is fine) but when this other cluster also has DRBD nodes somehow DRBD goes of the rails with a lot of errors and taints the nodes with quorum lost.

In dmesg I see errors like "unexpected connection from..."

When I check the node pool in DRBD I see weird things such as this :

k linstor node list                                                                                             
╭───────────────────────────────────────────────────╮
┊ Node ┊ NodeType  ┊ Addresses             ┊ State  ┊
╞═══════════════════════════════════════════════════╡
┊ n1   ┊ SATELLITE ┊ FD00::215:3367 (SSL)  ┊ Online ┊
┊ n2   ┊ SATELLITE ┊ 10.0.1.130:3367 (SSL) ┊ Online ┊
┊ n3   ┊ SATELLITE ┊ 10.0.0.100:3367 (SSL) ┊ Online ┊

The nodes have picked up cilium clusterpool IPs (?) instead of an IP from the services CIDR (?).
FYI my cluster is configured with

nodes: 192.168.1.1/21, 2a02::...
pods:  - fd00:0:a:1000::/64 10.229.0.0/16
services:  fd00:0:a:2000::/108 10.116.0.0/16

and the other cluster has completely different IPAM CIDRS. They simply can talk to each other (2 switches appart, without firewall)

logs on the storage controller of the melted nodes are an infinite loop of

k logs -n piraeus-datastore ha-controller-nmj7l                                                                                 
I0906 20:17:11.721589       1 agent.go:201] version: v1.2.1
I0906 20:17:11.721633       1 agent.go:202] node: n2
I0906 20:17:11.721666       1 agent.go:228] waiting for caches to sync
I0906 20:17:11.822332       1 agent.go:230] caches synced
I0906 20:17:11.822343       1 agent.go:253] starting reconciliation
I0906 20:17:21.721850       1 agent.go:253] starting reconciliation
I0906 20:17:31.722435       1 agent.go:253] starting reconciliation
I0906 20:17:41.722249       1 agent.go:253] starting reconciliation
I0906 20:17:51.725813       1 agent.go:253] starting reconciliation
I0906 20:18:01.722503       1 agent.go:253] starting reconciliation
I0906 20:18:11.722730       1 agent.go:253] starting reconciliation
I0906 20:18:21.721834       1 agent.go:253] starting reconciliation
I0906 20:18:31.725762       1 agent.go:253] starting reconciliation
I0906 20:18:41.721830       1 agent.go:253] starting reconciliation
I0906 20:18:51.721842       1 agent.go:253] starting reconciliation
I0906 20:19:01.725122       1 agent.go:253] starting reconciliation
I0906 20:19:11.722531       1 agent.go:253] starting reconciliation
I0906 20:19:21.725799       1 agent.go:253] starting reconciliation
I0906 20:19:31.721836       1 agent.go:253] starting reconciliation
I0906 20:19:31.721888       1 agent.go:440] updating node taints
I0906 20:19:41.721829       1 agent.go:253] starting reconciliation
I0906 20:19:51.725696       1 agent.go:253] starting reconciliation
I0906 20:20:01.721817       1 agent.go:253] starting reconciliation
I0906 20:20:11.721682       1 agent.go:253] starting reconciliation
I0906 20:20:21.721857       1 agent.go:253] starting reconciliation
I0906 20:20:31.725702       1 agent.go:253] starting reconciliation
I0906 20:20:41.721834       1 agent.go:253] starting reconciliation
I0906 20:20:51.725135       1 agent.go:253] starting reconciliation
I0906 20:21:01.721821       1 agent.go:253] starting reconciliation
I0906 20:21:11.721821       1 agent.go:253] starting reconciliation

If I use ipam.mode=kubernetes, I will see :

 k linstor node list                                                         
╭──────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType  ┊ Addresses                        ┊ State  ┊
╞══════════════════════════════════════════════════════════════╡
┊ n1   ┊ SATELLITE ┊ FD00:0:A:1000::C89E:3367 (SSL)   ┊ Online ┊
┊ n2   ┊ SATELLITE ┊ FD00:0:A:1000:1::E131:3367 (SSL) ┊ Online ┊
┊ n3   ┊ SATELLITE ┊ FD00:0:A:1000:2::A7CB:3367 (SSL) ┊ Online ┊
╰──────────────────────────────────────────────────────────────╯

which is the correct values from the pod CIDR. No quorum lost there.

I actually started noticing this when I turned on encryption strict mode in prod cluster. But after digging a bit I am thinking there is an issue with ipam.mode=clusterpool somehow. This severely breaks storage accross ALL clusters in 10.0.0.0/8 apparently. You have been warned.

Right now I am trying to figure out what is going on, so this Bug report will be very vague and I am posting here to see if anyone has any idea what is up with this.

How can we reproduce the issue?

  1. deploy 2 clusters in adjacent networks with ipam.mode=clusterpool
  2. deploy linstor-operator with full drbd config and encrypted dataplane on both
  3. watch as over time your storage fabric melts.

Kernel Version

6.6.43-talos

Kubernetes Version

1.30.3

@bernardgut
Copy link
Author

I guess my first question here is this :

Does linstor/drbd supports operating two separate instances in the same subnet ?

(In my case it seems like it is two separate clusters pods subnets that accidentally happened to be able to talk to each-other through some Cilium config trick)

@WanzenBug
Copy link
Member

Does linstor/drbd supports operating two separate instances in the same subnet ?

Sure, we don't do anything fancy with network. For LINSTOR/DRBD there are just the other cluster nodes that are identified by their IP address.

What you seem to be experiencing is that there are multiple Pods that are assigned the same IP address. I'm guessing it's that a Pod in cluster A is assigned the same IP address as a Pod in cluster B. This would explain the "unexpected connection from..." messages. This would then cause some weird states in DRBD I guess...

@bernardgut
Copy link
Author

bernardgut commented Sep 9, 2024

Update:
After testing It seems the following Cilium configs break DRBD quorum without recovery:

  • ipam.mode=clusterScope
  • encryption.enabled=True -> breaks after a while. Interestingly no unexpected connection from.. messages. The quorum is just lost without a lot of explanation.

Note: these problems don't appear in a virtual environment (In my case a proxmox/qemu machines) deployment of Cilium. Only on bare-metal.

For anyone reading this trying to deploy this in Prod you've been warned.

Re : confliction IP : I dont think two nodes had the same IP though... I think they just were assigned IP from the wrong pool but thats it. I can test again if I re-create another cluster on bare-metal but for now I have to move on with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants