Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Add possible monitoring and metrics to decide the lifecycle of the Agent Nodes #498

Open
peterzhuamazon opened this issue Oct 10, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Oct 10, 2024

As a followup to this issue: #494.

Remove an agent node after X runs is a quick fix to remove agent with issues, but it is not fine-grain or robust enough when we have a lot of agents.

We can go ahead and use some custom solutions, such as jenkins plugins, groovy scripts, or even the metrics cluster to find out the current status of the agent nodes. Define a baseline for the health of the agents, and remove them once there is an outage on such node.

Example plugin: https://plugins.jenkins.io/monitoring/
cc: @getsaurabh02 @prudhvigodithi @gaiksaya

Thanks.

@rishabh6788
Copy link
Collaborator

We already have a configuration that cleans up agent nodes after a certain number of executions and haven't noticed any issue with an agent being stuck for a long period of time. @gaiksaya @prudhvigodithi thoughts? Can we close it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 📦 Backlog
Development

No branches or pull requests

3 participants