Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkins deadlock Error #4380

Closed
amitkbaid opened this issue Nov 7, 2024 · 2 comments
Closed

Jenkins deadlock Error #4380

amitkbaid opened this issue Nov 7, 2024 · 2 comments

Comments

@amitkbaid
Copy link

Service(s)

infra.ci.jenkins.io

Summary

I am running Jenkins 2.401 on Kubernetes. The controller runs on JDK 11. The setup makes use of 2 inbound agents to run different pipeline jobs.

inbound agent 1 uses 3198.v03a_401881f3e-1-jdk17 docker image (JDK - 17)
inbound agent 2 uses 3107.v665000b_51092-15. docker image (JDK - 11)

We are intermittently seeing nodes turning offline when the job begins when using both the agents. There is no pattern observed, the error may appear after 15 pipeline submissions or may not appear for 30 submissions.

Below is the message written to the controller log file. The agent log file remains clear with no errors.

2024-10-01 04:51:46.333+0000 [id=7688] WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [Computer.threadPoolForRemoting [#1] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@4acb9e4c (owned by jenkins.util.Timer [#1]): at java.base@11.0.19/jdk.internal.misc.Unsafe.park(Native Method)

I came across a recommendation in Jenkins JIRA to upgrade the pipeline plugin, however the version of the plugin we are running is latest.

We have plans to upgrade Jenkins and agents to run on JDK - 17 starting next year, However we are unable to determine the cause of the problem.

When this happens, restarting Jenkins from the console does not work. Aborting a job does not terminate the node/pod, however the job gets aborted. Getting into nodes and cloud to delete the node, does not get the node deleted.

The agent pod does not report any errors.

Temporarily we have disabled the node version monitor plugin, the workaround currently in place is to scale down Jenkins controller to 0 and scale it back to 1.

The Kubernetes version 1.30.

Reproduction steps

No steps to reproduce as the issue does not have any pattern.

@amitkbaid amitkbaid added the triage Incoming issues that need review label Nov 7, 2024
Copy link

github-actions bot commented Nov 7, 2024

Take a look at these similar issues to see if there isn't already a response to your problem:

  1. 70% Jenkins  #3603
  2. 70% Jenkins #3308
  3. 70% Jenkins #3285

@dduportal
Copy link
Contributor

This issue tracker is not for Jenkins questions. Please check the Community Forum.

@dduportal dduportal closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2024
@dduportal dduportal removed the triage Incoming issues that need review label Nov 7, 2024
@dduportal dduportal added this to the infra-team-sync-2024-11-12 milestone Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants