Failed to scrape node: remote error: tls: internal error #1480

rarecrumb · 2024-05-01T17:06:13Z

What happened: Metrics server failed to scrape a node

What you expected to happen: Successfully scrape the node

Anything else we need to know?: Deploying with the helm chart

Environment:

Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.): EKS
Container Network Setup (flannel, calico, etc.): Calico
Kubernetes version (use kubectl version): 1.29
Metrics Server manifest

spoiler for Metrics Server manifest:

      args:
      - --kubelet-insecure-tls
      containerPort: 4443
      hostNetwork:
        enabled: true

Kubelet config:

spoiler for Kubelet config:

Metrics server logs:

spoiler for Metrics Server logs:

E0501 16:40:35.362224       1 scraper.go:149] "Failed to scrape node" err="Get \"https://10.3.10.48:10250/metrics/resource\": remote error: tls: internal error" node="ip-10-3-10-48.ec2.internal"

Status of Metrics API:

spolier for Status of Metrics API:

kubectl describe apiservice v1beta1.metrics.k8s.io

Name:         v1beta1.metrics.k8s.io
Namespace:
Labels:       app.kubernetes.io/instance=metrics-server
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=metrics-server
              app.kubernetes.io/version=0.7.1
              argocd.argoproj.io/instance=metrics-server
              helm.sh/chart=metrics-server-3.12.1
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2023-07-13T20:41:49Z
  Resource Version:    266080474
  UID:                 59bdff53-5db0-4819-a27e-6aff8526d41e
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       base
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2024-04-30T18:18:43Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available
Events:                    <none>

/kind bug

The text was updated successfully, but these errors were encountered:

logicalhan · 2024-05-02T16:44:20Z

/kind support
/triage accepted

kanhayaKy · 2024-07-08T09:11:33Z

Any update on this?
I'm having similar issues,

The log from the metrics-server

E0704 07:13:21.054122       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.68.188:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-68-188.ec2.internal"
E0704 07:13:36.062399       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.68.188:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-68-188.ec2.internal"
E0704 07:13:36.120301       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.94.156:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-94-156.ec2.internal"
E0704 07:13:36.128872       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.66.224:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-66-224.ec2.internal"
E0704 07:20:51.104101       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.33.165:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-33-165.ec2.internal"

On the node we are able to see that it's listening on port 10250, and was also able to establish connection to the prometheus operator pods

sh-4.2$ netstat -a | grep 10250
tcp6       0      0 [::]:10250              [::]:*                  LISTEN
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-84-140.:59798 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:39802 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:44384 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:39806 ESTABLISHED

This is a very strange behavior as we have not changed any config and started getting this issue out of no where

nathan-bowman · 2024-08-07T14:59:36Z

It seems that there is a PR in the works, but has anyone derived a work around for this issue?

NahumLitvin · 2024-09-04T11:38:06Z

i am having the same issue

dcherniv · 2024-09-04T20:15:47Z

Check if the CSR for the node is signed.
Ran into something similar recently awslabs/amazon-eks-ami#1944

tgelter · 2024-10-08T14:49:15Z

Our fleet is experiencing this issue also. Would additional logs or anything else be helpful to resolve this?

luarx · 2024-10-09T21:56:25Z

Having the same issue, looking forward for a fix 🙏

Wicaeed · 2024-10-24T22:34:38Z

Same, metrics-server logs just full of this error.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 1, 2024

k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 2, 2024

dongjiang1989 linked a pull request Jul 10, 2024 that will close this issue

Fix: fix remote error: tls: internal error #1522

Open

nathan-bowman mentioned this issue Jul 18, 2024

Splunk Operator: Autoscaling Issue splunk/splunk-operator#1352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to scrape node: remote error: tls: internal error #1480

Failed to scrape node: remote error: tls: internal error #1480

rarecrumb commented May 1, 2024

logicalhan commented May 2, 2024

kanhayaKy commented Jul 8, 2024

nathan-bowman commented Aug 7, 2024

NahumLitvin commented Sep 4, 2024

dcherniv commented Sep 4, 2024

tgelter commented Oct 8, 2024

luarx commented Oct 9, 2024

Wicaeed commented Oct 24, 2024

Failed to scrape node: remote error: tls: internal error #1480

Failed to scrape node: remote error: tls: internal error #1480

Comments

rarecrumb commented May 1, 2024

logicalhan commented May 2, 2024

kanhayaKy commented Jul 8, 2024

nathan-bowman commented Aug 7, 2024

NahumLitvin commented Sep 4, 2024

dcherniv commented Sep 4, 2024

tgelter commented Oct 8, 2024

luarx commented Oct 9, 2024

Wicaeed commented Oct 24, 2024