Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build init logs doesn't print hostname if there is an initialization errror #3089

Open
jithine opened this issue Apr 4, 2024 · 5 comments
Open
Assignees
Labels
Milestone

Comments

@jithine
Copy link
Member

jithine commented Apr 4, 2024

What happened:

After changes in screwdriver-cd/executor-k8s#186 Build hostname is not displayed, if launcher is not able to run. This scenario can happen if build init fails.

We were updating launcher from v6.0.182 to v6.0.198

What you expected to happen:

Build init logs should always show the hostname where the build was scheduled.

How to reproduce it:

@jithine jithine added the bug label Apr 4, 2024
@jithine
Copy link
Member Author

jithine commented Apr 4, 2024

We should discuss the approach from #2882

@VonnyJap
Copy link
Member

VonnyJap commented Apr 29, 2024

I wonder if there is still a chance that the build node information is missing even after the revert, in the case of no resource is available at the first update and hence is not displayed in the UI and the launcher fails to run in init step hence not updating the build stats with node name.

But I believe when a build is pending now, it will be automatically put to retry queue on this? But how many attempts to retry happens until the build has a resource to be scheduled on?
https://github.com/screwdriver-cd/buildcluster-queue-worker/blob/c6086f39ca196f89ecd38d46eb48f344eb8adf14/receiver.js#L110-L113

And in addition to that, the build status is not updated in here
https://github.com/screwdriver-cd/executor-k8s/blob/9c72331ee8e70622d174cdb6eb4993b4059c7a15/index.js#L723-L773

@VonnyJap
Copy link
Member

VonnyJap commented May 3, 2024

@yk634 - I created a draft PR to implement your suggestion here. Can you please take a look if it makes sense? If not then, please suggest changes according to what you proposed initially.

@yk634
Copy link
Contributor

yk634 commented May 7, 2024

@VonnyJap
It's been so long since I was looking into this that my memory isn't quite sure...

If the verify method is executed when the pod is not started due to lack of resources, I am not sure if it is possible to get the value from the spec.nodeName.

If it is possible to get the value, I think the PR fix is fine.

I was thinking of the idea of having the executor-k8s verify rerun as a retry queue even in the case of insufficient resources, assuming that the value cannot be retrieved.
(Currently, I don't think insufficient resources are reexecuted as a retry queue.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Doing
Development

No branches or pull requests

3 participants