-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Common Event API for Remediation cannot start due to node not found #124
base: main
Are you sure you want to change the base?
Conversation
clobrano
commented
Feb 9, 2024
- Update common to v1.15.1
- Use medik8s/common API for remediation cannot start event
Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com>
Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com>
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: clobrano The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
verifyConditionUnset(commonconditions.PermanentNodeDeletionExpectedType) | ||
verifyEvents([]expectedEvent{ | ||
{v1.EventTypeWarning, "RemediationSkippedNodeNotFound", "failed to fetch node", true}, | ||
{v1.EventTypeWarning, "RemediationCannotStart", "Could not get remediation target Node", true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't you mean to switch RemediationSkippedNodeNotFound
to remediationCannotStartNodeNotFound
instead of RemediationCannotStart
?
{v1.EventTypeWarning, "RemediationCannotStart", "Could not get remediation target Node", true}, | |
{v1.EventTypeWarning, remediationCannotStartNodeNotFound, "Could not get remediation target Node", true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from common
, where the reason is RemediationCannotStart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then maybe we should update Common first.
It seems weird that there areremediationCannotStartNoControllerOwner
, and RemediationCannotStartMachineNotFound
event reasons, which are straightforward, and instead of the old RemediationSkippedNodeNotFound
reason we use generic RemediationCannotStart
instead of remediationCannotStartNodeNotFound
. All Medik8s remediator operators work with nodes, and this generic event is meant for when the node is missing and then remediation cannot start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use generic
RemediationCannotStart
I think it's good that is generic because it's in common
🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why be generic if being more specific still fits everyone? The only downside for this is if we change the CRs API and logic then we might not have this kind of event. If so, then a new event should be created. That seems ok to me.
I think it would make sense to use remediationCannotStartNodeNotFound
in common, so MDR, FAR, SNR (and NHC) could use the same event which would be straightforward, and with a better meaning IMO. I don't see another RemediationCannotStart kind of scenario that fits all the operators, if there is one that could fit some of them and we want to use this event for this use case then it would be fine. But having a more concrete event reason still seems better to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still the message "Could not get remediation target Node" to clarify.
I'd say, I could use the same reason "remediationSkipped" for the other MDR's event and then clarify either "missing controller owner" or "missing target node"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common
repo could instead get provide it's reasons
as variables so that the operators can reuse it (e.g. common.RemediationCannotStart
)
// Cluster provider is not set in this test | ||
{commonconditions.PermanentNodeDeletionExpectedType, metav1.ConditionUnknown, v1alpha1.MachineDeletionOnUndefinedProviderReason}}) | ||
verifyEvents([]expectedEvent{ | ||
{v1.EventTypeWarning, "RemediationSkippedNoControllerOwner", noControllerOwnerErrorMsg, true}, | ||
{v1.EventTypeWarning, "RemediationCannotStartNoControllerOwner", noControllerOwnerErrorMsg, true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use the variable (remediationCannotStartNoControllerOwner)?
{v1.EventTypeWarning, "RemediationCannotStartNoControllerOwner", noControllerOwnerErrorMsg, true}, | |
{v1.EventTypeWarning, remediationCannotStartNoControllerOwner", noControllerOwnerErrorMsg, true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea, I just need to make it a string though 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to just use the variable. Isn't it enough? 🤔
FYI there are more occurrences of "RemediationCannotStartNoControllerOwner"
that could be modified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to just use the variable. Isn't it enough? 🤔
the variable is not a string, or did I misunderstand the question?
FYI there are more occurrences of "RemediationCannotStartNoControllerOwner" that could be modified**
Thanks, I'll update the other after the other thread then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do need casting for string. remediationCannotStartNoControllerOwner
is of type conditionChangeReason and I though it could be used as string for this matter, but this is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com>
a27d581
to
790f254
Compare
/test 4.15-openshift-e2e |
go.mod
Outdated
@@ -4,7 +4,7 @@ go 1.21 | |||
|
|||
require ( | |||
github.com/go-logr/logr v1.4.1 | |||
github.com/medik8s/common v1.13.0 | |||
github.com/medik8s/common v1.15.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: There is already a newer version of common https://github.com/medik8s/common/tags that we can use