-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node agent restart enhancement #7130
Node agent restart enhancement #7130
Conversation
780edca
to
51bb5f6
Compare
294d16a
to
cc9b9b1
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #7130 +/- ##
==========================================
+ Coverage 61.19% 61.70% +0.50%
==========================================
Files 258 258
Lines 27366 27740 +374
==========================================
+ Hits 16747 17117 +370
+ Misses 9434 9420 -14
- Partials 1185 1203 +18 ☔ View full report in Codecov by Sentry. |
cc9b9b1
to
e7e5da4
Compare
e7e5da4
to
3df177e
Compare
I am thinking if this could lead to other issues. if you think restart of node agent is likely when it is running out of mem / cpu due to high load in backing up/restoring certain large volumes Here since today we cancel it, we atleast terminate and fail early. With your changes if we re-put, we might again cause the node agent to restart. This might be a vicious cycle. |
yes, What you considered is really a problem. we should cancel the in progress CR directly |
d6807d4
to
56233a4
Compare
err = UpdateDataDownloadWithRetry(ctx, cli, types.NamespacedName{Namespace: dd.Namespace, Name: dd.Name}, | ||
r.logger.WithField("dataupload", dd.Name), func(dataDownload *velerov2alpha1api.DataDownload) { | ||
dataDownload.Spec.Cancel = true | ||
dataDownload.Status.Message = fmt.Sprintf("found a datadownload %s/%s is being deleted, mark it as cancel", dd.Namespace, dd.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Message is not correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, Updated
if err != nil { | ||
r.logger.WithError(err).Errorf("failed to set cancel flag with error %s", err.Error()) | ||
} | ||
r.logger.Warn(dd.Status.Message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this log ahead, otherwise, when UpdateDataDownloadWithRetry
returns an error, failed to set cancel flag with error %s
will be first printed, which is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
}) | ||
|
||
if err != nil { | ||
logger.WithError(errors.WithStack(err)).Errorf("failed to mark datadownload %q into prepared", dd.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message, into prepared
should be into cancelled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
r.logger.WithError(errors.WithStack(err)).Errorf("failed to mark dataupload %q cancel", du.GetName()) | ||
continue | ||
} | ||
r.logger.WithField("dataupload", du.GetName()).Warn(du.Status.Message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same, move it ahead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
}) | ||
|
||
if err != nil { | ||
logger.WithError(errors.WithStack(err)).Errorf("failed to mark dataupload %q into prepared", du.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
into cancelled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
logger.WithError(errors.WithStack(err)).Errorf("failed to mark datadownload %q into prepared", dd.GetName()) | ||
continue | ||
} | ||
logger.WithField("datadownload", dd.GetName()).Debug("mark datadownload into prepared") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
into cancelled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
logger.WithError(errors.WithStack(err)).Errorf("failed to mark dataupload %q into prepared", du.GetName()) | ||
continue | ||
} | ||
logger.WithField("dataupload", du.GetName()).Debug("mark dataupload into prepared") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
into cancelled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
4a01c2d
to
17e3354
Compare
// CancelAcceptedDataDownload will cancel the accepted data download | ||
func (r *DataDownloadReconciler) CancelAcceptedDataDownload(ctx context.Context, cli client.Client, ns string) { | ||
r.logger.Infof("Canceling accepted data for node %s", r.nodeName) | ||
dataDownloads, err := r.findAcceptDataDownloadsByNodeLable(ctx, cli, ns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, updated
Signed-off-by: Ming Qiu <mqiu@vmware.com>
17e3354
to
98a56eb
Compare
Thank you for contributing to Velero!
Please add a summary of your change
Enhance the recoverability of backup & restore in data-mover
Prepared
phases when the node-agent restarts instead of canceling directly.velero.io/accepted-by
and cancel it when node-agent restarts.Does your change fix a particular issue?
Fixes #(issue)
#6727
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.