Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Add possible monitoring and metrics to decide the lifecycle of the Agent Nodes #498

Open
peterzhuamazon opened this issue Oct 10, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Oct 10, 2024

As a followup to this issue: #494.

Remove an agent node after X runs is a quick fix to remove agent with issues, but it is not fine-grain or robust enough when we have a lot of agents.

We can go ahead and use some custom solutions, such as jenkins plugins, groovy scripts, or even the metrics cluster to find out the current status of the agent nodes. Define a baseline for the health of the agents, and remove them once there is an outage on such node.

Example plugin: https://plugins.jenkins.io/monitoring/
cc: @getsaurabh02 @prudhvigodithi @gaiksaya

Thanks.

@github-actions github-actions bot added the untriaged Issues that have not yet been triaged label Oct 10, 2024
@gaiksaya gaiksaya added enhancement New feature or request and removed untriaged Issues that have not yet been triaged labels Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

2 participants