Jenkins deadlock Error #4380

amitkbaid · 2024-11-07T14:18:22Z

Service(s)

infra.ci.jenkins.io

Summary

I am running Jenkins 2.401 on Kubernetes. The controller runs on JDK 11. The setup makes use of 2 inbound agents to run different pipeline jobs.

inbound agent 1 uses 3198.v03a_401881f3e-1-jdk17 docker image (JDK - 17)
inbound agent 2 uses 3107.v665000b_51092-15. docker image (JDK - 11)

We are intermittently seeing nodes turning offline when the job begins when using both the agents. There is no pattern observed, the error may appear after 15 pipeline submissions or may not appear for 30 submissions.

Below is the message written to the controller log file. The agent log file remains clear with no errors.

2024-10-01 04:51:46.333+0000 [id=7688] WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [Computer.threadPoolForRemoting [#1] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@4acb9e4c (owned by jenkins.util.Timer [#1]): at java.base@11.0.19/jdk.internal.misc.Unsafe.park(Native Method)

I came across a recommendation in Jenkins JIRA to upgrade the pipeline plugin, however the version of the plugin we are running is latest.

We have plans to upgrade Jenkins and agents to run on JDK - 17 starting next year, However we are unable to determine the cause of the problem.

When this happens, restarting Jenkins from the console does not work. Aborting a job does not terminate the node/pod, however the job gets aborted. Getting into nodes and cloud to delete the node, does not get the node deleted.

The agent pod does not report any errors.

Temporarily we have disabled the node version monitor plugin, the workaround currently in place is to scale down Jenkins controller to 0 and scale it back to 1.

The Kubernetes version 1.30.

Reproduction steps

No steps to reproduce as the issue does not have any pattern.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-11-07T14:18:42Z

Take a look at these similar issues to see if there isn't already a response to your problem:

dduportal · 2024-11-07T14:44:35Z

This issue tracker is not for Jenkins questions. Please check the Community Forum.

amitkbaid added the triage Incoming issues that need review label Nov 7, 2024

jenkins-infra-helpdesk-app bot added the infra.ci.jenkins.io label Nov 7, 2024

dduportal closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2024

dduportal removed the triage Incoming issues that need review label Nov 7, 2024

dduportal added this to the infra-team-sync-2024-11-12 milestone Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jenkins deadlock Error #4380

Jenkins deadlock Error #4380

amitkbaid commented Nov 7, 2024

github-actions bot commented Nov 7, 2024

dduportal commented Nov 7, 2024

Jenkins deadlock Error #4380

Jenkins deadlock Error #4380

Comments

amitkbaid commented Nov 7, 2024

Service(s)

Summary

Reproduction steps

github-actions bot commented Nov 7, 2024

Take a look at these similar issues to see if there isn't already a response to your problem:

dduportal commented Nov 7, 2024