Cleanup of kind cluster #6768

jainpulkit22 · 2024-10-24T09:37:48Z

Describe the bug
The CI jobs fail because of panic in the cleanup of existing kind cluster. Because in the current implementation of cleanup function for kind cluster, the code tries to get the creation timestamp of all the available kind clusters using the command
kubectl get nodes --context kind-$kind_cluster_name -o json -l node-role.kubernetes.io/control-plane | \ jq -r '.items[0].metadata.creationTimestamp' ,
and sometimes there may be other job running on the same vm that has just started and the cluster creation is in process so the context is not ready but when another job tries to create the cluster it will stuck in this step and that job will panic and fail.

Not only in case of parallel job runs, but also if some job is aborted in the cluster creation phase the context of the kind cluster will not be available and whenever any new job will run on this testbed and will run the cleanup function the job will fail because it will try to fetch the context of clusters listed by kind get clusters using the above command and will panic causing the job to fail.

To Reproduce
Trigger two kind jobs at the same time on same vm, or trigger one job and then as soon as the cluster creation starts, abort the job and then trigger a new job on the same testbed the second job will fail because of panic in both the cases.

Expected
The jobs should not fail and cluster creation should be successful.

Actual behavior
The job fails

Additional context
Reference to current implementation of cleanup function: clean_kind

#5753

The text was updated successfully, but these errors were encountered:

rajnkamr · 2024-10-25T06:24:54Z

Duplicate of #5753

jainpulkit22 · 2024-10-25T08:09:25Z

Duplicate of #5753

This is a different issue, this issue is in the implementation of context based cleanup of clusters. The issue you have mentioned is already taken care now this issue is a bug in the implementation of the issue pointed out by you.
Also this is not related to cleanup of antrea installation it is related to deletion of cluster or basically we can say cleanup of testbed that happens before start of the test.

jainpulkit22 added kind/bug Categorizes issue or PR as related to a bug. area/test/jenkins Issue about jenkins setup code labels Oct 24, 2024

jainpulkit22 linked a pull request Oct 25, 2024 that will close this issue

Fix context issue during cleanup of kind clusters #6771

Open

rajnkamr added this to the Antrea v2.3 release milestone Oct 25, 2024

rajnkamr added the duplicate This issue or pull request already exists label Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup of kind cluster #6768

Cleanup of kind cluster #6768

jainpulkit22 commented Oct 24, 2024 •

edited by rajnkamr

Loading

rajnkamr commented Oct 25, 2024

jainpulkit22 commented Oct 25, 2024 •

edited

Loading

Cleanup of kind cluster #6768

Cleanup of kind cluster #6768

Comments

jainpulkit22 commented Oct 24, 2024 • edited by rajnkamr Loading

rajnkamr commented Oct 25, 2024

jainpulkit22 commented Oct 25, 2024 • edited Loading

jainpulkit22 commented Oct 24, 2024 •

edited by rajnkamr

Loading

jainpulkit22 commented Oct 25, 2024 •

edited

Loading