Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to scrape node: remote error: tls: internal error #1480

Open
rarecrumb opened this issue May 1, 2024 · 8 comments · May be fixed by #1522
Open

Failed to scrape node: remote error: tls: internal error #1480

rarecrumb opened this issue May 1, 2024 · 8 comments · May be fixed by #1522
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@rarecrumb
Copy link

What happened: Metrics server failed to scrape a node

What you expected to happen: Successfully scrape the node

Anything else we need to know?: Deploying with the helm chart

Environment:

  • Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.): EKS

  • Container Network Setup (flannel, calico, etc.): Calico

  • Kubernetes version (use kubectl version): 1.29

  • Metrics Server manifest

spoiler for Metrics Server manifest:
      args:
      - --kubelet-insecure-tls
      containerPort: 4443
      hostNetwork:
        enabled: true
  • Kubelet config:
spoiler for Kubelet config:
  • Metrics server logs:
spoiler for Metrics Server logs:
E0501 16:40:35.362224       1 scraper.go:149] "Failed to scrape node" err="Get \"https://10.3.10.48:10250/metrics/resource\": remote error: tls: internal error" node="ip-10-3-10-48.ec2.internal"
  • Status of Metrics API:
spolier for Status of Metrics API:
kubectl describe apiservice v1beta1.metrics.k8s.io
Name:         v1beta1.metrics.k8s.io
Namespace:
Labels:       app.kubernetes.io/instance=metrics-server
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=metrics-server
              app.kubernetes.io/version=0.7.1
              argocd.argoproj.io/instance=metrics-server
              helm.sh/chart=metrics-server-3.12.1
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2023-07-13T20:41:49Z
  Resource Version:    266080474
  UID:                 59bdff53-5db0-4819-a27e-6aff8526d41e
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       base
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2024-04-30T18:18:43Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available
Events:                    <none>

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 1, 2024
@logicalhan
Copy link

/kind support
/triage accepted

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 2, 2024
@kanhayaKy
Copy link

Any update on this?
I'm having similar issues,

The log from the metrics-server

E0704 07:13:21.054122       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.68.188:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-68-188.ec2.internal"
E0704 07:13:36.062399       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.68.188:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-68-188.ec2.internal"
E0704 07:13:36.120301       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.94.156:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-94-156.ec2.internal"
E0704 07:13:36.128872       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.66.224:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-66-224.ec2.internal"
E0704 07:20:51.104101       1 scraper.go:140] "Failed to scrape node" err="Get \"https://172.31.33.165:10250/metrics/resource\": remote error: tls: internal error" node="ip-172-31-33-165.ec2.internal"


On the node we are able to see that it's listening on port 10250, and was also able to establish connection to the prometheus operator pods

sh-4.2$ netstat -a | grep 10250
tcp6       0      0 [::]:10250              [::]:*                  LISTEN
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-84-140.:59798 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:39802 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:44384 ESTABLISHED
tcp6       0      0 ip-172-31-34-243.:10250 ip-172-31-46-100.:39806 ESTABLISHED

This is a very strange behavior as we have not changed any config and started getting this issue out of no where

@nathan-bowman
Copy link

It seems that there is a PR in the works, but has anyone derived a work around for this issue?

@NahumLitvin
Copy link

i am having the same issue

@dcherniv
Copy link

dcherniv commented Sep 4, 2024

Check if the CSR for the node is signed.
Ran into something similar recently awslabs/amazon-eks-ami#1944

@tgelter
Copy link

tgelter commented Oct 8, 2024

Our fleet is experiencing this issue also. Would additional logs or anything else be helpful to resolve this?

@luarx
Copy link

luarx commented Oct 9, 2024

Having the same issue, looking forward for a fix 🙏

@Wicaeed
Copy link

Wicaeed commented Oct 24, 2024

Same, metrics-server logs just full of this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants