[bug] Current cleanup.yaml `ensure-subresources-deleted` fails on incomplete installs #145

mallardduck · 2025-01-09T20:32:58Z

Per title, the ensure-subresources-deleted job for the cleanup manifest is prone to errors when installs encounter issues. I observed this via the logs seeing:

Ensuring HelmCharts and HelmReleases are deleted from cattle-monitoring-system...
waiting for HelmCharts and HelmReleases to be deleted from cattle-monitoring-system... sleeping 3 seconds
waiting for HelmCharts and HelmReleases to be deleted from cattle-monitoring-system... sleeping 3 seconds
waiting for HelmCharts and HelmReleases to be deleted from cattle-monitoring-system... sleeping 3 seconds

Which is expected but it should run fairly fast. So I checked the initial kubectl command that the script uses to populate resources to clean, and found:

# kubectl get helmcharts,helmreleases
error: the server doesn't have a resource type "helmreleases"

So it appears that the clean up script is failing due to helmreleases never having been installed. This makes sense because the PromFed container never started. And in this case the CRDs are managed by the container/operator not a CRD specific chart.

We should adjust this cleanup script to not fail when encountering these edge cases. As while they should be uncommon it only increases difficulty a customer would experience recovering from the initial issue I'm investigating.

Further, add-cleanup-annotations is subject to similar but instead the error is specific to authentication it seems. That one is not a consistent error, if I see this again I will report back and update the issue. I suspect it was a weird "race condition" (for lack of better term) where the ServiceAccount was removed and the cleanup script no longer had access. Or something similar?

The text was updated successfully, but these errors were encountered:

github-actions bot added the team/observability&backup label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Current cleanup.yaml `ensure-subresources-deleted` fails on incomplete installs #145

[bug] Current cleanup.yaml `ensure-subresources-deleted` fails on incomplete installs #145

mallardduck commented Jan 9, 2025

[bug] Current cleanup.yaml ensure-subresources-deleted fails on incomplete installs #145

[bug] Current cleanup.yaml ensure-subresources-deleted fails on incomplete installs #145

Comments

mallardduck commented Jan 9, 2025

[bug] Current cleanup.yaml `ensure-subresources-deleted` fails on incomplete installs #145

[bug] Current cleanup.yaml `ensure-subresources-deleted` fails on incomplete installs #145