You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Remove an agent node after X runs is a quick fix to remove agent with issues, but it is not fine-grain or robust enough when we have a lot of agents.
We can go ahead and use some custom solutions, such as jenkins plugins, groovy scripts, or even the metrics cluster to find out the current status of the agent nodes. Define a baseline for the health of the agents, and remove them once there is an outage on such node.
We already have a configuration that cleans up agent nodes after a certain number of executions and haven't noticed any issue with an agent being stuck for a long period of time. @gaiksaya@prudhvigodithi thoughts? Can we close it?
As a followup to this issue: #494.
Remove an agent node after X runs is a quick fix to remove agent with issues, but it is not fine-grain or robust enough when we have a lot of agents.
We can go ahead and use some custom solutions, such as jenkins plugins, groovy scripts, or even the metrics cluster to find out the current status of the agent nodes. Define a baseline for the health of the agents, and remove them once there is an outage on such node.
Example plugin: https://plugins.jenkins.io/monitoring/
cc: @getsaurabh02 @prudhvigodithi @gaiksaya
Thanks.
The text was updated successfully, but these errors were encountered: