Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS problem for k3s multicloud cluster #10900

Open
allnightlong opened this issue Sep 16, 2024 Discussed in #10897 · 12 comments
Open

DNS problem for k3s multicloud cluster #10900

allnightlong opened this issue Sep 16, 2024 Discussed in #10897 · 12 comments

Comments

@allnightlong
Copy link

allnightlong commented Sep 16, 2024

Discussed in #10897

Originally posted by allnightlong September 15, 2024
I'm building my cluster with nodes from different datacenters. Actually, the cluster has lived in one dc for some time with 5 node (1 server + 4 agents). Now I'm adding new node in different dc.
Using this tutorial as an example https://docs.k3s.io/networking/distributed-multicloud#embedded-k3s-multicloud-solution

For server:

--node-external-ip=<SERVER_EXTERNAL_IP> --flannel-backend=wireguard-native --flannel-external-ip

For agent:

--node-external-ip=<AGENT_EXTERNAL_IP>

The problem is non of the agent's pod can resolve any hostname. I'm using official dns resolution guide https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ and nslookup is failing both for internal and external requests.
Internal:

kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached


command terminated with exit code 1

External:

kubectl exec -i -t dnsutils -- nslookup goo.gl                                  
;; connection timed out; no servers could be reached


command terminated with exit code 1

External with cloudflare's dns:

kubectl exec -i -t dnsutils -- nslookup goo.gl 1.1.1.1
Server:         1.1.1.1
Address:        1.1.1.1#53

Non-authoritative answer:
Name:   goo.gl
Address: 142.250.193.238
Name:   goo.gl
Address: 2404:6800:4002:81d::200e

What could be the problem of this DNS issue? How could I resolve it?

P.S. I'm using k3s version v1.30.4+k3s1 (latest at the time of writing) both for server and agents.

2024-09-15_02-50-58

@brandond
Copy link
Contributor

This indicates that the wireguard mesh between nodes isn't functioning properly, and DNS traffic between the affected node, and the node running the coredns pod, is being dropped. Ensure that you've opened all the correct ports for wireguard, and that you have node external-IPs set correctly for wireguard to correctly establish the mesh between nodes.

@allnightlong
Copy link
Author

allnightlong commented Sep 16, 2024

Hi, @brandond , thank you for the answer.

Here is my cluster state:

k get no -o wide                                                                                                                    
NAME           STATUS     ROLES                       AGE   VERSION        INTERNAL-IP   EXTERNAL-IP       OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
core           Ready      control-plane,core,master   15d   v1.30.4+k3s1   10.0.1.4      146.185.xxx.xxx   Ubuntu 24.04.1 LTS   6.8.0-41-generic   containerd://1.7.20-k3s1
node-iota      Ready      node                        8d    v1.30.4+k3s1   10.0.1.2      <none>            Ubuntu 24.04.1 LTS   6.8.0-41-generic   containerd://1.7.20-k3s1
node-kappa     Ready      node                        22h   v1.30.4+k3s1   10.0.1.99     109.120.xxx.xx    Ubuntu 24.04.1 LTS   6.8.0-44-generic   containerd://1.7.20-k3s1
node-lambda    Ready      node                        22h   v1.30.4+k3s1   10.0.1.98     109.120.xxx.xx    Ubuntu 24.04.1 LTS   6.8.0-44-generic   containerd://1.7.20-k3s1
node-theta     Ready      node                        8d    v1.30.4+k3s1   10.0.1.8      <none>            Ubuntu 24.04.1 LTS   6.8.0-41-generic   containerd://1.7.20-k3s1

The main node core and nodes node-iota and note-theta are in dc1. Nodes node-kappa and node-lambda are in dc2.

I'm checking connectivity, according the page - https://docs.k3s.io/installation/requirements#networking.

From core node I'm able to connect to node-labmda by TCP to port 10250 and by UDP to port 51820.
2024-09-16_23-17-52

From node-labmda I can connect to core by TCP port 6443, TCP port 10250 and UDP port 51820.
2024-09-16_23-14-42

Here is my config for core server node:

cat /etc/systemd/system/k3s.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    server \
        '--tls-san' \
        'core.xxx.cloud' \
        '--node-external-ip=146.185.xxx.xxx' \
        '--flannel-backend=wireguard-native' \
        '--flannel-external-ip' \
        '--bind-address=0.0.0.0' \
        '--kubelet-arg=allowed-unsafe-sysctls=net.ipv6.*' \
        '--kubelet-arg=allowed-unsafe-sysctls=net.ipv4.*' \

Here is my config for node-lambda agent node:

cat /etc/systemd/system/k3s-agent.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s-agent.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    agent \
        '--kubelet-arg=allowed-unsafe-sysctls=net.ipv6.*' \
        '--kubelet-arg=allowed-unsafe-sysctls=net.ipv4.*' \
        '--node-ip=10.0.1.98' \
        '--node-external-ip=109.120.xxx.xx' \

TBH, not sure in which direction should I go at this point, so any suggestions are welcome.

@brandond
Copy link
Contributor

@manuelbuil do you have any tips on how to check wireguard connectivity between nodes?

@manuelbuil
Copy link
Contributor

@allnightlong could you run the following commands:

1 - Install wireguard-tools and then execute sudo wg in the node where dnsutils is running
2 - Search for the IP of the coredns pod ($COREDNSIP) and then execute: kubectl exec -i -t dnsutils -- nslookup goo.gl $COREDNSIP and see if that works
3 - Can you ping $COREDNSIP from the node where dnsutils is running?

@allnightlong
Copy link
Author

allnightlong commented Sep 17, 2024

Hi, @manuelbuil, thank you for the answers, here is my cluster's state:

  1. From node-lambda (in datacenter 2) I execute sudo wg:
sudo wg
interface: flannel-wg
  public key: UxoKiZzDtXIwVgpYKXSucgqm52oB+k4GT2LjDK6t0mI=
  private key: (hidden)
  listening port: 51820

peer: DIMwbxQYU3uKGxnLrY0N4/hp9u9oAvQg/dQOJAYLiVk=
  endpoint: 146.185.xxx.xxx:51820
  allowed ips: 10.42.0.0/24
  latest handshake: 24 seconds ago
  transfer: 18.61 MiB received, 17.27 MiB sent
  persistent keepalive: every 25 seconds

peer: +wGbtSsm5PDnDPB9N6n/SlKi3aeiKi2gsgEyeQBs7Wc=
  endpoint: 109.120.xxx.xxx:51820
  allowed ips: 10.42.8.0/24
  latest handshake: 1 minute, 40 seconds ago
  transfer: 221.55 KiB received, 303.81 KiB sent
  persistent keepalive: every 25 seconds

146.185.xxx.xxx - is a core server node (datacenter 1).
109.120.xxx.xxx:51820 0 is a node-kappa agent node (datacenter 2).

  1. I've got dnsutils pod running on node-lambda (datacenter 2)
kubectl exec -i -t dnsutils -- nslookup goo.gl 10.42.6.128 
;; communications error to 10.42.6.128#53: timed out
;; communications error to 10.42.6.128#53: timed out
;; communications error to 10.42.6.128#53: timed out
;; no servers could be reached


command terminated with exit code 1

if I run dnsutils on node-iota (datacenter 1), the connection is ok

kubectl exec -i -t dnsutils -- nslookup goo.gl 10.42.6.128                                        
Server:         10.42.6.128
Address:        10.42.6.128#53

Non-authoritative answer:
Name:   goo.gl
Address: 64.233.165.138
Name:   goo.gl
Address: 64.233.165.113
Name:   goo.gl
Address: 64.233.165.100
Name:   goo.gl
Address: 64.233.165.101
Name:   goo.gl
Address: 64.233.165.139
Name:   goo.gl
Address: 64.233.165.102
Name:   goo.gl
Address: 2a00:1450:4010:c08::66
Name:   goo.gl
Address: 2a00:1450:4010:c08::64
Name:   goo.gl
Address: 2a00:1450:4010:c08::71
Name:   goo.gl
Address: 2a00:1450:4010:c08::65
  1. PING
ping 10.42.6.128
PING 10.42.6.128 (10.42.6.128) 56(84) bytes of data.
From 10.42.9.0 icmp_seq=1 Destination Host Unreachable
ping: sendmsg: Required key not available
From 10.42.9.0 icmp_seq=2 Destination Host Unreachable
ping: sendmsg: Required key not available
From 10.42.9.0 icmp_seq=3 Destination Host Unreachable
ping: sendmsg: Required key not available
From 10.42.9.0 icmp_seq=4 Destination Host Unreachable
ping: sendmsg: Required key not available
From 10.42.9.0 icmp_seq=5 Destination Host Unreachable
ping: sendmsg: Required key not available
From 10.42.9.0 icmp_seq=6 Destination Host Unreachable
ping: sendmsg: Required key not available
^C
--- 10.42.6.128 ping statistics ---
6 packets transmitted, 0 received, +6 errors, 100% packet loss, time 5146ms

@brandond
Copy link
Contributor

Run those tests on all the nodes. You need full connectivity between all cluster members, since the coredns pod may run on any node.

@allnightlong
Copy link
Author

you are right, @brandond , coredns pod is on node-iota
but I can connect to it from node-theta (dc1)

kubectl exec -i -t dnsutils -- nslookup goo.gl 10.42.6.128                                      
Server:         10.42.6.128
Address:        10.42.6.128#53

Non-authoritative answer:
Name:   goo.gl
Address: 64.233.165.138
Name:   goo.gl
Address: 64.233.165.102
Name:   goo.gl
Address: 64.233.165.100
Name:   goo.gl
Address: 64.233.165.113
Name:   goo.gl
Address: 64.233.165.101
Name:   goo.gl
Address: 64.233.165.139
Name:   goo.gl
Address: 2a00:1450:4010:c08::66
Name:   goo.gl
Address: 2a00:1450:4010:c08::64
Name:   goo.gl
Address: 2a00:1450:4010:c08::8a
Name:   goo.gl
Address: 2a00:1450:4010:c08::71

@allnightlong
Copy link
Author

allnightlong commented Sep 18, 2024

I think, I've figured out the problem. It was combination of 2 factors:

  1. Only server node in datacenter 1 had EXTERNAL-IP configured. Other two agent nodes (iota and theta) had only INTERNAL-IP.
  2. dns pod was running on agent node (iota).

My expectations were, that connectivity should be established only between any agent node and server node. And k3s should setup VPN between all node through server node. Apparently, it requires each node to have public IP for this stack to work.

Another expectation was, that all system pods would run on server node. Apparently this is not the case either.
Thank you @brandond , @manuelbuil for helping me sorting things out.

In this situation my only request would be to make documentation more clear about that, as I've spent quite some time, trying figuring out the problem.

And I didn't found any config option to move all kube-system pods to server node - is it possible?

@manuelbuil
Copy link
Contributor

Great that you found the problem! Thanks for taking the effort

My expectations were, that connectivity should be established only between any agent node and server node. And k3s should setup VPN between all node through server node. Apparently, it requires each node to have public IP for this stack to work.

We can add mor information in the docs but right now it is stated that K3s uses wireguard to establish a VPN mesh for cluster traffic. What you are describing would be a VPN star topology or hub-spoke, not a mesh

@allnightlong
Copy link
Author

thank you, for clearing things out for me!

@brandond
Copy link
Contributor

brandond commented Sep 18, 2024

My expectations were, that connectivity should be established only between any agent node and server node. And k3s should setup VPN between all node through server node.

As Manuel (and the docs) said, wireguard is a full mesh. What you're asking for is closer to what tailscale does. If you want something more like a star/hub-and-spoke, you should look into using tailscale. This is covered in the docs.

Another expectation was, that all system pods would run on server node.

I'm curious where this expectation came from. There is nothing special about pods in the kube-system namespace, they will run on any available node in the cluster, same as any other pod.

@allnightlong
Copy link
Author

In my setup, core in is command-only node, were a tasks distributor is located. All other nodes are high-intense CPU usage nodes.

I've already run into a problem, when core node was also a worker, and due to high-load k3s was very slow to response to kubectl commands.

That's why I don't want any of the system-important pods to run anywhere, but core.
I've manage to even move system-upgrade-controller pod to core with complex:

spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
      - key: node-role.kubernetes.io/control-plane
        operator: In
        values:
          - "true"

but I don't know, how to force this dns pod to run on main node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

No branches or pull requests

3 participants