You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears that confd does not re-resolve etcd SRV records after startup. This makes the SRV support actually quite dangerous to use if you ever intend to change the contents of the record...
We had many confds across our fleet 'stuck' attempting to look up the previous, decommissioned members of our etcd cluster:
Aug 03 13:43:57 deploy1002 confd[32274]: 2022-08-03T13:43:57Z deploy1002 /usr/bin/confd[32274]: ERROR client: etcd cluster is unavailable or misconfigured; error #0: dial tcp: lookup conf1004.eqiad.wmnet on 10.3.0.1:53: no such host
Aug 03 13:43:57 deploy1002 confd[32274]: ; error #1: dial tcp: lookup conf1006.eqiad.wmnet on 10.3.0.1:53: no such host
Aug 03 13:43:57 deploy1002 confd[32274]: ; error #2: dial tcp: lookup conf1005.eqiad.wmnet on 10.3.0.1:53: no such host
Meanwhile the SRV record had looked like this in our DNS for at least a week's time:
_etcd._tcp.eqiad.wmnet has SRV record 0 1 4001 conf1008.eqiad.wmnet.
_etcd._tcp.eqiad.wmnet has SRV record 0 1 4001 conf1009.eqiad.wmnet.
_etcd._tcp.eqiad.wmnet has SRV record 0 1 4001 conf1007.eqiad.wmnet.
This is with version:
confd 0.16.0 (Git SHA: , Go Version: go1.11.6)
The text was updated successfully, but these errors were encountered:
It appears that confd does not re-resolve etcd SRV records after startup. This makes the SRV support actually quite dangerous to use if you ever intend to change the contents of the record...
We had many confds across our fleet 'stuck' attempting to look up the previous, decommissioned members of our etcd cluster:
Aug 03 13:43:57 deploy1002 confd[32274]: 2022-08-03T13:43:57Z deploy1002 /usr/bin/confd[32274]: ERROR client: etcd cluster is unavailable or misconfigured; error #0: dial tcp: lookup conf1004.eqiad.wmnet on 10.3.0.1:53: no such host
Aug 03 13:43:57 deploy1002 confd[32274]: ; error #1: dial tcp: lookup conf1006.eqiad.wmnet on 10.3.0.1:53: no such host
Aug 03 13:43:57 deploy1002 confd[32274]: ; error #2: dial tcp: lookup conf1005.eqiad.wmnet on 10.3.0.1:53: no such host
Meanwhile the SRV record had looked like this in our DNS for at least a week's time:
_etcd._tcp.eqiad.wmnet has SRV record 0 1 4001 conf1008.eqiad.wmnet.
_etcd._tcp.eqiad.wmnet has SRV record 0 1 4001 conf1009.eqiad.wmnet.
_etcd._tcp.eqiad.wmnet has SRV record 0 1 4001 conf1007.eqiad.wmnet.
This is with version:
confd 0.16.0 (Git SHA: , Go Version: go1.11.6)
The text was updated successfully, but these errors were encountered: