Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using TLS min version 1.3 with Envoy Proxy #36181

Closed
joel-vaz opened this issue Sep 17, 2024 · 6 comments
Closed

Error when using TLS min version 1.3 with Envoy Proxy #36181

joel-vaz opened this issue Sep 17, 2024 · 6 comments
Labels
area/configuration area/tls question Questions that are neither investigations, bugs, nor enhancements stale stalebot believes this issue/PR has not been touched recently

Comments

@joel-vaz
Copy link

joel-vaz commented Sep 17, 2024

Hello,

I'm trying to update my service mesh (Consul - Envoy) to use TLS minimum version 1.3 on my cluster, updating from version 1.2.

  • Consul Version: 1.16.6
  • Envoy Version: 1.26.8

I confirmed that both the Consul server and Consul agent are correctly configured to use the minimum version of TLS 1.3, but the Envoy proxy that I use as a sidecar for my services is in an unhealthy status with the log:

DeltaAggregatedResources gRPC config stream to local_agent closed since 97s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION

Consul Agent Configuration:

{
  "acl": {
    "enabled": true,
    "down_policy": "async-cache",
    "default_policy": "deny",
    "tokens": {
      "default": ""
    }
  },
  "enable_central_service_config": false,
  "datacenter": "",
  "encrypt": "",
  "encrypt_verify_incoming": true,
  "encrypt_verify_outgoing": true,
  "server": false,
  "log_level": "INFO",
  "advertise_addr": "",
  "bind_addr": "0.0.0.0",
  "client_addr": "0.0.0.0",
  "data_dir": "/consul/data",
  "retry_join": [
    ""
  ],
  "auto_encrypt": {
    "tls": true,
    "ip_san": [
      ""
    ]
  },
  "tls": {  
    "defaults": {
      "ca_file": "/consul/ca.pem",
      "verify_outgoing": true,
      "verify_incoming": false,
      "tls_min_version": "TLSv1_3"
    },
    "internal_rpc": {
      "verify_server_hostname": true
    }
  },
  "leave_on_terminate": true,
  "ports": {
    "https": 8501,
    "http": -1,
    "grpc": 8502,
    "grpc_tls": 8503
  },
  "domain": "consul",
  "node_meta": {
    "env": "",
    "version": ""
  }
}

Envoy Service Configuration:

{
  "service": {
    "name": "",
    "id": "",
    "token": "",
    "address": "",
    "port": 0,
    "meta": {
      "env": "",
      "version": ""
    },
    "check": {
      "deregister_critical_service_after": "30m",
      "http": "",
      "method": "GET",
      "interval": "",
      "timeout": ""
    },
    "connect": {
      "sidecar_service": {
        "port": 21000,
        "checks": [
          {
            "name": "Connect Envoy Sidecar",
            "tcp": "",
            "interval": "10s"
          },
          {
            "id": "",
            "alias_service": ""
          }
        ],
        "proxy": {
          "config": {
            "envoy_stats_bind_addr": "0.0.0.0:19001",
            "envoy_tracing_json": "{\"http\":{\"name\":\"envoy.tracers.datadog\",\"typedConfig\":{\"@type\":\"type.googleapis.com/envoy.config.trace.v3.DatadogConfig\",\"collector_cluster\":\"datadog_8126\",\"service_name\":\"%NAME%\"}}}",
            "envoy_extra_static_clusters_json": "{\"connect_timeout\":\"3.000s\",\"dns_lookup_family\":\"V4_ONLY\",\"lb_policy\":\"ROUND_ROBIN\",\"load_assignment\":{\"cluster_name\":\"datadog_8126\",\"endpoints\":[{\"lb_endpoints\":[{\"endpoint\":{\"address\":{\"socket_address\":{\"address\":\"%ADDRESS%\",\"port_value\":8126,\"protocol\":\"TCP\"}}}}]}]},\"name\":\"datadog_8126\",\"type\":\"STRICT_DNS\"}"
          },
          "upstreams": []
        }
      }
    }
  }
}

Can I get some help on this issue, please? Did anyone go through the same? 🙏

Kind Regards,
Joel Vaz

@joel-vaz joel-vaz added the triage Issue requires triage label Sep 17, 2024
@zuercher zuercher added question Questions that are neither investigations, bugs, nor enhancements area/tls area/configuration and removed triage Issue requires triage labels Sep 17, 2024
@zuercher
Copy link
Member

Envoy's default max TLS version for clients is TLS 1.2 by default (https://www.envoyproxy.io/docs/envoy/v1.31.1/api-v3/extensions/transport_sockets/tls/v3/common.proto). In your Envoy configuration, you can specify that it to be TLS 1.3 and the connection should succeed.

The docs I linked are for the latest version, but that's true in Envoy 1.26 as well. That version is no longer supported. You should upgrade.

@joel-vaz
Copy link
Author

joel-vaz commented Sep 17, 2024

Hello @zuercher first thanks for your quick reply!

Secondly, could you help me understand where to configure that parameter, my Envoy configuration is generated from the consul connect envoy command snippet bellow this file is part of our docker container entrypoint script and uses the service configuration I sent above as a base template to generate the Envoy configuration with the correct env variables bellow:

set_proxy_configuration()
{
  ## Env variables code
  ##

  base_renderers=$(jq '.service.connect.sidecar_service.proxy.upstreams = '"${CONSUL_SERVICE_UPSTREAMS}"' |
      .service.name = "'${SERVICE_NAME}'" |
      .service.id = "'${SERVICE_ID}'" |
      .service.token = "'${CONSUL_HTTP_TOKEN}'" |
      .service.address = "'${CONTAINER_IP}'" |
      .service.port = '${SERVICE_PORT}' |
      .service.meta.env = "'${DD_ENV}'" |
      .service.meta.version = "'${DD_VERSION}'" |
      .service.connect.sidecar_service.port = '${SIDECAR_PORT}' |
      .service.check.http = "'${SERVICE_HEALTH_CHECK}'" |
      .service.check.interval = "'${SERVICE_HEALTH_CHECK_INTERVAL}'" |
      .service.check.timeout = "'${SERVICE_HEALTH_CHECK_TIMEOUT}'" |
      .service.connect.sidecar_service.checks[0].tcp = "'${SIDECAR_HEALTH_CHECK}'" |
      .service.connect.sidecar_service.checks[1].id = "'${SERVICE_ID}'-alias" |
      .service.connect.sidecar_service.checks[1].alias_service = "'${SERVICE_ID}'" |
      .service.connect.sidecar_service.proxy.config.envoy_tracing_json |=gsub("%NAME%";"'$DD_SERVICE'") |
      .service.connect.sidecar_service.proxy.config.envoy_extra_static_clusters_json |=gsub("%ADDRESS%";"'$EC2_HOST_ADDRESS'")' ./service_config.json )
echo "Base Renderers configuration: $base_renderers"

  # Wait until Consul can be contacted
  until curl -s -k ${CONSUL_HTTP_ADDR}/v1/status/leader | grep ***; do
    echo "Waiting for Consul to start at ${CONSUL_HTTP_ADDR}."
    sleep 1
  done

  echo "Registering service with consul ${SERVICE_CONFIG_FILE}."
  consul services register ${SERVICE_CONFIG_FILE}

  consul connect envoy -sidecar-for=${SERVICE_ID} -grpc-ca-file=${CONSUL_CACERT} $ENVOY_DEBUG &
}

I've been trying for a few days to enforce the envoy to use the TLS min version 1.3, including going through the documentation you sent above but to no effect. I always end up with the TLS version miss match.

Concerning the version, we plan to move to a stable release of envoy, but we are doing incremental upgrades since this service was left without update for a few years.

Kind Regards,
Joel Vaz

@zuercher
Copy link
Member

Ultimately, the Envoy configuration must have a static cluster defined which species how to connect to the XDS server (provided by consul). That Envoy cluster definition will have a transport_socket field configured with an UpstreamTlsContext object. Within that UpstreamTlsContext you'll need to make sure that common_tls_context.tls_params.tls_maximum_protocol_version is set to TLSv1_3. How to get consul connect envoy to generate that config is a question for a consul forum.

@joel-vaz
Copy link
Author

joel-vaz commented Sep 19, 2024

Hey @zuercher

I updated my service configuration file for the envoy proxy:

  "service": {
    "name": <value_hidden>,
    "id": <value_hidden>,
    "token": <value_hidden>,
    "address": <value_hidden>,
    "port": <value_hidden>,
    "meta": {
      "env": <value_hidden>,
      "version": <value_hidden>
    },
    "check": {
      "deregister_critical_service_after": "30m",
      "http": <value_hidden>,
      "method": "GET",
      "interval": "1s",
      "timeout": "1s"
    },
    "connect": {
      "sidecar_service": {
        "port": <value_hidden>,
        "checks": [
          {
            "name": "Connect Envoy Sidecar",
            "tcp": <value_hidden>,
            "interval": "10s"
          },
          {
            "id": "<value_hidden>",
            "alias_service": "<value_hidden>"
          }
        ],
        "proxy": {
          "config": {
            "envoy_stats_bind_addr": "<value_hidden>",
            "envoy_tracing_json": "{\"http\":{\"name\":\"envoy.tracers.datadog\",\"typedConfig\":{\"@type\":\"type.googleapis.com/envoy.config.trace.v3.DatadogConfig\",\"collector_cluster\":\"datadog_8126\",\"service_name\":\"<value_hidden>\"}}}",
            "envoy_extra_static_clusters_json": "{\"connect_timeout\":\"3.000s\",\"dns_lookup_family\":\"V4_ONLY\",\"lb_policy\":\"ROUND_ROBIN\",\"load_assignment\":{\"cluster_name\":\"datadog_8126\",\"endpoints\":[{\"lb_endpoints\":[{\"endpoint\":{\"address\":{\"socket_address\":{\"address\":\"<value_hidden>\",\"port_value\":<value_hidden>,\"protocol\":\"TCP\"}}}}]}]},\"name\":\"datadog_8126\",\"type\":\"STRICT_DNS\"}",
            "common_tls_context": {
              "tls_params": {
                "tls_minimum_protocol_version": "TLSv1_3"
              }
            }
          },
          "upstreams": []
        }
      }
    }
  }
}

Still got the same error with Envoy defaulting to TLS 1.2. Is there something missing on the Envoy service configuration file? 🤔

Additional logs:

[2024-09-19 09:51:55.555][37][info][main] [source/server/server.cc:456] HTTP header map info:
[2024-09-19 09:51:55.556][37][info][main] [source/server/server.cc:459]   request header map: 672 bytes: :authority,:method,:path,:protocol,:scheme,accept,accept-encoding,access-control-request-headers,access-control-request-method,access-control-request-private-network,authentication,authorization,cache-control,cdn-loop,connection,content-encoding,content-length,content-type,expect,grpc-accept-encoding,grpc-timeout,if-match,if-modified-since,if-none-match,if-range,if-unmodified-since,keep-alive,origin,pragma,proxy-connection,proxy-status,referer,te,transfer-encoding,upgrade,user-agent,via,x-client-trace-id,x-envoy-attempt-count,x-envoy-decorator-operation,x-envoy-downstream-service-cluster,x-envoy-downstream-service-node,x-envoy-expected-rq-timeout-ms,x-envoy-external-address,x-envoy-force-trace,x-envoy-hedge-on-per-try-timeout,x-envoy-internal,x-envoy-ip-tags,x-envoy-is-timeout-retry,x-envoy-max-retries,x-envoy-original-path,x-envoy-original-url,x-envoy-retriable-header-names,x-envoy-retriable-status-codes,x-envoy-retry-grpc-on,x-envoy-retry-on,x-envoy-upstream-alt-stat-name,x-envoy-upstream-rq-per-try-timeout-ms,x-envoy-upstream-rq-timeout-alt-response,x-envoy-upstream-rq-timeout-ms,x-envoy-upstream-stream-duration-ms,x-forwarded-client-cert,x-forwarded-for,x-forwarded-host,x-forwarded-port,x-forwarded-proto,x-ot-span-context,x-request-id
[2024-09-19 09:51:55.556][37][info][main] [source/server/server.cc:459]   request trailer map: 120 bytes:
[2024-09-19 09:51:55.556][37][info][main] [source/server/server.cc:459]   response header map: 432 bytes: :status,access-control-allow-credentials,access-control-allow-headers,access-control-allow-methods,access-control-allow-origin,access-control-allow-private-network,access-control-expose-headers,access-control-max-age,age,cache-control,connection,content-encoding,content-length,content-type,date,etag,expires,grpc-message,grpc-status,keep-alive,last-modified,location,proxy-connection,proxy-status,server,transfer-encoding,upgrade,vary,via,x-envoy-attempt-count,x-envoy-decorator-operation,x-envoy-degraded,x-envoy-immediate-health-check-fail,x-envoy-ratelimited,x-envoy-upstream-canary,x-envoy-upstream-healthchecked-cluster,x-envoy-upstream-service-time,x-request-id
[2024-09-19 09:51:55.556][37][info][main] [source/server/server.cc:459]   response trailer map: 144 bytes: grpc-message,grpc-status
[2024-09-19 09:51:55.562][37][info][main] [source/server/server.cc:827] runtime: layers:
  - name: base
    static_layer:
      re2.max_program_size.error_level: 1048576
[2024-09-19 09:51:55.562][37][info][admin] [source/server/admin/admin.cc:66] admin address: 127.0.0.1:19000
[2024-09-19 09:51:55.563][37][info][config] [source/server/configuration_impl.cc:131] loading tracing configuration
[2024-09-19 09:51:55.563][37][info][config] [source/server/configuration_impl.cc:142]   validating default server-wide tracing driver: envoy.tracers.datadog
[2024-09-19 09:51:55.563][37][info][config] [source/server/configuration_impl.cc:91] loading 0 static secret(s)
[2024-09-19 09:51:55.563][37][info][config] [source/server/configuration_impl.cc:97] loading 3 cluster(s)
[2024-09-19 09:51:55.612][37][info][config] [source/server/configuration_impl.cc:101] loading 1 listener(s)
[2024-09-19 09:51:55.614][37][info][config] [source/server/configuration_impl.cc:113] loading stats configuration
[2024-09-19 09:51:55.614][37][info][runtime] [source/common/runtime/runtime_impl.cc:463] RTDS has finished initialization
[2024-09-19 09:51:55.614][37][info][upstream] [source/common/upstream/cluster_manager_impl.cc:221] cm init: initializing cds
[2024-09-19 09:51:55.614][37][warning][main] [source/server/server.cc:802] there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
[2024-09-19 09:51:55.615][37][info][main] [source/server/server.cc:923] starting main dispatch loop
[2024-09-19 09:52:48.043][37][warning][config] [./source/common/config/grpc_stream.h:191] DeltaAggregatedResources gRPC config stream to local_agent closed since 52s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION

Thank you so much for the help you are providing 🙏

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Oct 19, 2024
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/configuration area/tls question Questions that are neither investigations, bugs, nor enhancements stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

2 participants