-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Envoy cannot connect to the XDS Server #36150
Comments
Difficult to say, but perhaps the first entry in the DNS result of
It will continue to try to connect, but will eventually give up waiting for configuration and proceed with whatever static configuration may be available. See the initial_fetch_timeout configuration on ConfigSource.
I think each retry will be a new connection attempt. Whether it chooses the same host is dependent on the cluster's endpoints and the load balancing policy. Here it goes back to the DNS result and will always choose the first host in the most recent DNS response (this is the definition of LOGICAL_DNS). You might consider whether
That circuit breaker doesn't apply here. Partly this is because the max-retries circuit breaker is the maximum number of concurrent retries on a cluster (e.g. retries as configured in an HttpConnectionManager retry_policy). Partly because I don't think we do the circuit breaker accounting in the XDS client code. In any event, XDS only ever has a single grpc request open to an XDS server at a given time so even if the circuit breaker accounting is taking place, it'll never hit the limits. |
Hey @zuercher , |
XDS server was restarted and envoy got disconnected. But it wasn't able to connect again to the XDS server for around 1.5 hours. Then envoy was restarted again and was able to connect back to the XDS server.
Envoy logs have only warning message getting logged again and again:
[2024-09-11 09:43:24.904][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:190] DeltaAggregatedResources gRPC config stream to [] closed since 49223s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 113
I understand the part that Envoy is using backoff_strategy for retries and we get this warning message if error still persists after backoff cycle.
I have few doubts on this:
Metrics
upstream_cx_connect_fail
has value 3034 andupstream_cx_overflow
is 0.XDS server config. It is running as a headless service in Kubernetes.
The text was updated successfully, but these errors were encountered: