-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Failover Confusion in DRPC Action Post Hub Recovery #1179
Fix Failover Confusion in DRPC Action Post Hub Recovery #1179
Conversation
800e01a
to
f94cce6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change or the algorithm to determine drpc state. Also the problem, why hub recovery changes the failover cluster? Is this about getting a stale drpc after a hub recovery because we use an old backup?
@@ -28,6 +28,10 @@ type DRState string | |||
|
|||
// These are the valid values for DRState | |||
const ( | |||
// WaitForUser, state recorded in DRPC status to indicate that we are | |||
// waiting for the user to take an action after hub recover. | |||
WaitForUser = DRState("WaitForUser") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How ramen knows that there was a hub recovery? what is the difference between a hub recovery and a new install?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ramen does not know. Trying to rebuild its state based on what is out there.
progress, err := r.determineDRPCState(ctx, drpc, drPolicy, placementObj, dstCluster, log) | ||
progress, msg, err := r.determineDRPCState(ctx, drpc, drPolicy, placementObj, dstCluster, log) | ||
|
||
log.Info(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why log here an not in determineDRPCState()? we pass the logger to the function so it can log whatever needed. Here we log something that it returned without any context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It simply avoids logging before returning. We know we are returning a msg, so we log all rerturned messages in one place rather than doing this.
Log(msg)
return msg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that logging just the msg here without any context will make it hard to read the log line. Willing to let it be addressed in a different PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's do that in a different PR.
Trying to cover all combinations. And there are many. Think of it like this. A hub was recovered, DRPC restored with either current state or an old state, and also the source of truth (managed cluster) is down. |
a25db1d
to
499e88a
Compare
If the DRPC Action is set to Failover, for instance, transitioning from C1 to C2, a subsequent failover request from C2 to C1 creates confusion in Ramen post hub recovery because the action didn't change and the only thing that changed is the destination cluster. The solution is to permit failover from C2 to C1 if C1 is accessible. Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
499e88a
to
502ac08
Compare
If the DRPC Action is set to Failover, for instance, transitioning from C1 to C2, a subsequent failover request from C2 to C1 creates confusion in Ramen post hub recovery because the action didn't change and the only thing that changed is the destination cluster. The solution is to permit failover from C2 to C1 if C1 is accessible.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2258351