Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup of kind cluster #6768

Open
jainpulkit22 opened this issue Oct 24, 2024 · 2 comments · May be fixed by #6771
Open

Cleanup of kind cluster #6768

jainpulkit22 opened this issue Oct 24, 2024 · 2 comments · May be fixed by #6771
Labels
area/test/jenkins Issue about jenkins setup code duplicate This issue or pull request already exists kind/bug Categorizes issue or PR as related to a bug.

Comments

@jainpulkit22
Copy link
Contributor

jainpulkit22 commented Oct 24, 2024

Describe the bug
The CI jobs fail because of panic in the cleanup of existing kind cluster. Because in the current implementation of cleanup function for kind cluster, the code tries to get the creation timestamp of all the available kind clusters using the command
kubectl get nodes --context kind-$kind_cluster_name -o json -l node-role.kubernetes.io/control-plane | \ jq -r '.items[0].metadata.creationTimestamp' ,
and sometimes there may be other job running on the same vm that has just started and the cluster creation is in process so the context is not ready but when another job tries to create the cluster it will stuck in this step and that job will panic and fail.

Not only in case of parallel job runs, but also if some job is aborted in the cluster creation phase the context of the kind cluster will not be available and whenever any new job will run on this testbed and will run the cleanup function the job will fail because it will try to fetch the context of clusters listed by kind get clusters using the above command and will panic causing the job to fail.

To Reproduce
Trigger two kind jobs at the same time on same vm, or trigger one job and then as soon as the cluster creation starts, abort the job and then trigger a new job on the same testbed the second job will fail because of panic in both the cases.

Expected
The jobs should not fail and cluster creation should be successful.

Actual behavior
The job fails

Additional context
Reference to current implementation of cleanup function: clean_kind

#5753

@jainpulkit22 jainpulkit22 added kind/bug Categorizes issue or PR as related to a bug. area/test/jenkins Issue about jenkins setup code labels Oct 24, 2024
@rajnkamr rajnkamr added this to the Antrea v2.3 release milestone Oct 25, 2024
@rajnkamr rajnkamr added the duplicate This issue or pull request already exists label Oct 25, 2024
@rajnkamr
Copy link
Contributor

Duplicate of #5753

@jainpulkit22
Copy link
Contributor Author

jainpulkit22 commented Oct 25, 2024

Duplicate of #5753

This is a different issue, this issue is in the implementation of context based cleanup of clusters. The issue you have mentioned is already taken care now this issue is a bug in the implementation of the issue pointed out by you.
Also this is not related to cleanup of antrea installation it is related to deletion of cluster or basically we can say cleanup of testbed that happens before start of the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test/jenkins Issue about jenkins setup code duplicate This issue or pull request already exists kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants