-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure recoverability of backup/restore from WaitingForPluginoperations state during velero server restart #6727
Comments
Related: #6710 |
In 1.13 context, I'll drive towards completing these POCs and validating. If POC does not work, I will work towards identifying the action items needed here. |
@anshulahuja98 Thanks! And let me add this issue to 1.13 milestone, if we see any problem later, we can move it out. |
@anshulahuja98 Yes, I was going to create a bug on this today. Let me find the PR that added this -- that PR canceled data upload/download on node agent restart (which we need), but it also failed WaitingForPluginOperations backups and canceled data upload/download on velero pod restart, but we don't really want either of those. |
@anshulahuja98 this was it: #6461 |
@sseago thanks for sharing this. I'll check these changes and raise a PR to fix this behavior. |
@qiuming-best since you did these changes - do you see any concern with us changing the behaviour - WaitingForPluginOperations backups won't be failed on velero pod restart. |
@anshulahuja98 It's fine to revert the changes, I didn't know much about BIAV2 and RIAV2 at that time, so it was too rough to simply let the backup fail when the velero pod was restarted. If the velero pod that is doing backup or restore doesn't restart, we really don't need to fail to do backup or restore if the other velero pod restarts. |
Great |
As the current behavior of Velero server, once it restarts, it marks all the backups as |
Yes correct. That will take care of both these things. |
I am afraid that is not enough, because the existing velero server code marks all the running backup CRs as Failed. |
I understood now what you are saying. Will do the required changes for that also. |
thanks for your input @Lyndon-Li |
I will prioritize and try to complete this in next week. |
"the existing velero server code marks all the running backup CRs as Failed." -- for InProgress backups, this is still correct. If a backup has not progressed to WaitingForPluginOperations or Finalizing, then the only option is to fail it and start over with a new backup. Without failing InProgress backups, they will be listed as InProgress forever. |
Describe the problem/challenge you have
After the integration of BIAv2 based plugins, after the backup/restore's core flow is done, thy are marked with WaitingForPluginoperations phase after which velero polls async the plugin operations to complete.
3.Once we have clarity on above, in context of CSI datamover impl, we should ensure the above assumptions are not broken and we can recover tracking DataUpload/Download etc.
Describe the solution you'd like
Anything else you would like to add:
Environment:
velero version
):kubectl version
):/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: