Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting PVCs created by VolumeClaimTemplate is flaky #7138

Closed
QuanZhang-William opened this issue Sep 20, 2023 · 1 comment · Fixed by #7149
Closed

Deleting PVCs created by VolumeClaimTemplate is flaky #7138

QuanZhang-William opened this issue Sep 20, 2023 · 1 comment · Fixed by #7149
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@QuanZhang-William
Copy link
Member

Expected Behavior

pvcs created by VolumeClaimTemplate should be deleted when the owning pipelinerun is completed with coschedule:pipelineruns.

Actual Behavior

There is race condition, some pvcs are randomly left in terminating status

Steps to Reproduce the Problem

  1. Set disable-affinity-assistant: "true" and coschedule: "pipelineruns"
  2. Run the example PipelineRun multiple times
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
  generateName: demo-pipeline-run-template-only-
spec:
  pipelineSpec:
    workspaces:
    - name: workspace1
    - name: workspace2
    tasks:
    - name: foo
      workspaces:
      - name: workspace1
      - name: workspace2
      taskSpec:
        steps:
        - image: busybox
          script: echo "$(workspaces.workspace1.path) $(workspaces.workspace2.path)"
  workspaces:
  - name: workspace1
    volumeClaimTemplate:
      metadata:
        name: workspace1
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 16Mi
  - name: workspace2
    volumeClaimTemplate:
      metadata:
        name: workspace2
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 16Mi
  1. Check if pvcs and some PVCs are left in terminating status randomly
$ k get pvc
NAME                                                    STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
workspace1-4ed6374fea-affinity-assistant-c37495a2b7-0   Terminating   pvc-fa53b790-8811-403e-a6a2-dabe9e817d55   1Gi        RWO            standard       2m29s

Additional Info

  • Kubernetes version:

    Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.8", GitCommit:"0ce7342c984110dfc93657d64df5dc3b2c0d1fe9", GitTreeState:"clean", BuildDate:"2023-03-15T13:39:54Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.10-gke.2700", GitCommit:"df2448d7739c44e41ee999b9065ce506cab86344", GitTreeState:"clean", BuildDate:"2023-06-22T09:25:37Z", GoVersion:"go1.19.9 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

v0.51.0
@QuanZhang-William QuanZhang-William added the kind/bug Categorizes issue or PR as related to a bug. label Sep 20, 2023
@QuanZhang-William
Copy link
Member Author

Thanks for reporting the bug @jimmyjones2 (see #5776 (comment))!

The race condition is caused by the wrong order of purging finalizer and deleting PVCs.

The root cause is that the k8s PVCProtectionController adds the finalizer back if it detects the PVC is being used by a pod and the the kubernetes.io/pvc-protection is removed by Tekton. If the add back action happens before the PVC deletion call is completed, the PVC will be left in terminating status.

Switching the the order of the calls should fix the race condition.

/assign

QuanZhang-William added a commit to QuanZhang-William/pipeline that referenced this issue Sep 25, 2023
fixes [tektoncd#7138][tektoncd#7138]. In `coschedule:pipelineruns` mode, the `pvcs` created by `VolumeClaimTemplate` should be automatically deleted when the owning `pipelinerun` is completed.
To delete a `pvc` used by a pod (pipelinerun), the [kubernetes.io/pvc-protection][finalizer] finalizer must be removed.

Prior to this commit, the Tekton reconciler attempted to remove the `kubernetes.io/pvc-protection` finalizer first and then deletes the created pvcs. This results
in race condition since `PVCProtectionController` tries to add back the finalizer if the `pvc` is in `bounded` status . If the add back action happens before the PVC deletion call is completed,
the `PVC` will be left in `terminating` status.

This commit changes the reconciler to delete the `PVC` first and then removes the `finalizer` to fix the race condition. This commit also adds integration test to
validate `pvc` status when a `pipelinerun` is completed.

/kind bug

[tektoncd#7138]: tektoncd#7138
[finalizer]: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#storage-object-in-use-protection
QuanZhang-William added a commit to QuanZhang-William/pipeline that referenced this issue Sep 25, 2023
fixes [tektoncd#7138][tektoncd#7138]. In `coschedule:pipelineruns` mode, the `pvcs` created by `VolumeClaimTemplate` should be automatically deleted when the owning `pipelinerun` is completed.
To delete a `pvc` used by a pod (pipelinerun), the [kubernetes.io/pvc-protection][finalizer] finalizer must be removed.

Prior to this commit, the Tekton reconciler attempted to remove the `kubernetes.io/pvc-protection` finalizer first and then deletes the created pvcs. This results
in race condition since `PVCProtectionController` tries to add back the finalizer if the `pvc` is in `bounded` status . If the add back action happens before the PVC deletion call is completed,
the `PVC` will be left in `terminating` status.

This commit changes the reconciler to delete the `PVC` first and then removes the `finalizer` to fix the race condition.

This commit removes `TestPurgeFinalizerAndDeletePVCForWorkspace` UT, since the finalizer behavior cannot be mocked when a PVC is deleted.
This commit also adds integration test to cover this scenario.

/kind bug

[tektoncd#7138]: tektoncd#7138
[finalizer]: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#storage-object-in-use-protection
QuanZhang-William added a commit to QuanZhang-William/pipeline that referenced this issue Sep 28, 2023
fixes [tektoncd#7138][tektoncd#7138]. In `coschedule:pipelineruns` mode, the `pvcs` created by `VolumeClaimTemplate` should be automatically deleted when the owning `pipelinerun` is completed.
To delete a `pvc` used by a pod (pipelinerun), the [kubernetes.io/pvc-protection][finalizer] finalizer must be removed.

Prior to this commit, the Tekton reconciler attempted to remove the `kubernetes.io/pvc-protection` finalizer first and then deletes the created pvcs. This results
in race condition since `PVCProtectionController` tries to add back the finalizer if the `pvc` is in `bounded` status . If the add back action happens before the PVC deletion call is completed,
the `PVC` will be left in `terminating` status.

This commit changes the reconciler to delete the `PVC` first and then removes the `finalizer` to fix the race condition.

This commit removes `TestPurgeFinalizerAndDeletePVCForWorkspace` UT, since the finalizer behavior cannot be mocked when a PVC is deleted.
This commit also adds integration test to cover this scenario.

/kind bug

[tektoncd#7138]: tektoncd#7138
[finalizer]: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#storage-object-in-use-protection
QuanZhang-William added a commit to QuanZhang-William/pipeline that referenced this issue Oct 2, 2023
fixes [tektoncd#7138][tektoncd#7138]. In `coschedule:pipelineruns` mode, the `pvcs` created by `VolumeClaimTemplate` should be automatically deleted when the owning `pipelinerun` is completed.
To delete a `pvc` used by a pod (pipelinerun), the [kubernetes.io/pvc-protection][finalizer] finalizer must be removed.

Prior to this commit, the Tekton reconciler attempted to remove the `kubernetes.io/pvc-protection` finalizer first and then deletes the created pvcs. This results
in race condition since `PVCProtectionController` tries to add back the finalizer if the `pvc` is in `bounded` status . If the add back action happens before the PVC deletion call is completed,
the `PVC` will be left in `terminating` status.

This commit changes the reconciler to delete the `PVC` first and then removes the `finalizer` to fix the race condition.

This commit removes `TestPurgeFinalizerAndDeletePVCForWorkspace` UT, since the finalizer behavior cannot be mocked when a PVC is deleted.
This commit also adds integration test to cover this scenario.

/kind bug

[tektoncd#7138]: tektoncd#7138
[finalizer]: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#storage-object-in-use-protection
tekton-robot pushed a commit that referenced this issue Oct 3, 2023
fixes [#7138][#7138]. In `coschedule:pipelineruns` mode, the `pvcs` created by `VolumeClaimTemplate` should be automatically deleted when the owning `pipelinerun` is completed.
To delete a `pvc` used by a pod (pipelinerun), the [kubernetes.io/pvc-protection][finalizer] finalizer must be removed.

Prior to this commit, the Tekton reconciler attempted to remove the `kubernetes.io/pvc-protection` finalizer first and then deletes the created pvcs. This results
in race condition since `PVCProtectionController` tries to add back the finalizer if the `pvc` is in `bounded` status . If the add back action happens before the PVC deletion call is completed,
the `PVC` will be left in `terminating` status.

This commit changes the reconciler to delete the `PVC` first and then removes the `finalizer` to fix the race condition.

This commit removes `TestPurgeFinalizerAndDeletePVCForWorkspace` UT, since the finalizer behavior cannot be mocked when a PVC is deleted.
This commit also adds integration test to cover this scenario.

/kind bug

[#7138]: #7138
[finalizer]: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#storage-object-in-use-protection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant