GKE - Restore Gets Stuck due to Cluster Reconciliation #3522
Unanswered
codeviking
asked this question in
Community support Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm using Velero with a GKE cluster with a moderate number of workloads (~130 namespaces, where each has ~2 deployments with a couple of replicas per each).
I'm trying to use Velero to do a full restore of the cluster on a newly provisioned GKE cluster. The first time I attempt the restore it always fails and ends up stuck in the
InProgress
state. This occurs because the GKE cluster enters theRECONCILING
state while the backup is occurring, which causes the Kube API to temporarily become inaccessible (which in turn causes the restore to fail).I can "fix" this by executing the same restore after the cluster exits the
RECONCILING
state, but I'm left with a Velero restore that's forever stuck in theInProgress
state. Maybe that's ok and I should just ignore it, but I don't like leaving things in an indeterminate state 😅.I'm curious if others are running into this and if there's a workaround. I also wonder if this could be solved by adding a
velero restore resume
command, which would allow an operator to manually restart a restore. In my mind it's ok if it simply starts from the beginning since Velero is good about not overwriting things that already exist.Beta Was this translation helpful? Give feedback.
All reactions