Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XDS DeltaVirtualHosts gRPC config stream to xxx closed: grpc: received message larger than max #36169

Open
dmavrommatis opened this issue Sep 16, 2024 · 2 comments
Labels
area/xds question Questions that are neither investigations, bugs, nor enhancements

Comments

@dmavrommatis
Copy link

dmavrommatis commented Sep 16, 2024

Title: xDS server sends larger message than max

Description:
I have an envoy configuration that uses RDS and has more than 50k routes. Even though I am using DELTA_GRPC sometimes the proxy will end-up not being able to receive any new updates with the error message:

[2024-09-16 17:17:12.814][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:190] DeltaVirtualHosts gRPC config stream to xds_cluster closed since 105s ago: 8, grpc: received message larger than max (21173076 vs. 4194304)

Locally, I created a gRCP client with higher grpc.MaxCallRecvMsgSize(math.MaxInt32) and it worked but I am curious if this is something that we want to be able to configure on envoy as well.
Any idea why using the DELTA API is not enough and it batches bigger updates than the gRPC can handle?

I also saw that defaultServerMaxSendMessageSize = math.MaxInt32 vs defaultClientMaxReceiveMessageSize = 1024 * 1024 * 4 which is exactly what causes the issue to appear.

Repro steps:

  1. Use a simple cache xDS server from https://github.com/envoyproxy/go-control-plane
  2. Add tens of thousands of routes on RDS
  3. Error message appears

Config:
envoy.yaml

    admin:
      access_log_path: /dev/null
      address:
        socket_address:
          address: 0.0.0.0
          port_value: {{ .Values.proxy.config.info_port }}
    dynamic_resources:
      ads_config:
        api_type: DELTA_GRPC
        transport_api_version: V3
        grpc_services:
          - envoy_grpc:
              cluster_name: xds_cluster
        set_node_on_first_message_only: true
      cds_config:
        resource_api_version: V3
        ads: { }
      lds_config:
        path_config_source:
          path: {{ .Values.configgen.config.lds_path }}
    node:
      cluster: envoy-cluster
      id: {{ .Values.global.xdsNodeID }}
    static_resources:
      clusters:
        - name: xds_cluster
          type: STRICT_DNS
          connect_timeout: 10s
          load_assignment:
            cluster_name: xds_cluster
            endpoints:
              - lb_endpoints:
                  - endpoint:
                      address:
                        socket_address:
                          address: {{ .Values.global.xdsAddress }}
                          port_value: {{ .Values.global.xdsPort }}
          http2_protocol_options: { }
    layered_runtime:
      layers:
        - name: runtime-0
          rtds_layer:
            rtds_config:
              resource_api_version: V3
              api_config_source:
                transport_api_version: V3
                api_type: DELTA_GRPC
                grpc_services:
                  envoy_grpc:
                    cluster_name: xds_cluster
            name: runtime-0

lds.yaml

version_info: "0"
resources:
  - "@type": "type.googleapis.com/envoy.config.listener.v3.Listener"
    name: http_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 80
    filter_chains:
      - filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              stat_prefix: http
              codec_type: AUTO
              server_name: "abc"
              strip_any_host_port: true
              rds:
                route_config_name: "{{ .Route_config_name }}"
                config_source:
                  resource_api_version: V3
                  api_config_source:
                    api_type: DELTA_GRPC
                    transport_api_version: V3
                    grpc_services:
                      - envoy_grpc:
                          cluster_name: xds_cluster
                    set_node_on_first_message_only: true
              http_filters:
                - name: envoy.filters.http.router
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

Logs:

[2024-09-16 17:17:12.814][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:190] DeltaVirtualHosts gRPC config stream to xds_cluster closed since 105s ago: 8, grpc: received message larger than max (21173076 vs. 4194304)
@dmavrommatis dmavrommatis added bug triage Issue requires triage labels Sep 16, 2024
@dmavrommatis dmavrommatis changed the title XDS DeltaVirtualHosts gRPC config stream to xxx closed XDS DeltaVirtualHosts gRPC config stream to xxx closed: grpc: received message larger than max Sep 16, 2024
@zuercher zuercher added question Questions that are neither investigations, bugs, nor enhancements area/xds and removed bug triage Issue requires triage labels Sep 17, 2024
@zuercher
Copy link
Member

Based on my knowledge of delta XDS, we could probably split up the subscription requests to avoid hitting the default receive message size limit. But that limit is somewhat arbitrary. It don't recall it showing up in the gRPC spec and there's nothing to prevent a different gRPC server implementation from choosing a different limit. I think this ends up being a well-meaning default for servers with untrusted clients that trips up systems with trusted clients as their scale grows.

@dmavrommatis
Copy link
Author

Based on my knowledge of delta XDS, we could probably split up the subscription requests to avoid hitting the default receive message size limit. But that limit is somewhat arbitrary. It don't recall it showing up in the gRPC spec and there's nothing to prevent a different gRPC server implementation from choosing a different limit. I think this ends up being a well-meaning default for servers with untrusted clients that trips up systems with trusted clients as their scale grows.

I am using the https://github.com/envoyproxy/go-control-plane implementation of xDS server and it looks like it doesn't split up the requests and just full sends all the deltas disregarding the size.

In any case; the 4MB limit size on the receiving end of envoy seems very low. I haven't used/see other implementations of the control-plane (e.g. https://github.com/envoyproxy/java-control-plane) so it might be only the golang one that is the problematic and does not split-up messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/xds question Questions that are neither investigations, bugs, nor enhancements
Projects
None yet
Development

No branches or pull requests

2 participants