Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Common Event API for Remediation cannot start due to node not found #124

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

clobrano
Copy link
Contributor

@clobrano clobrano commented Feb 9, 2024

  • Update common to v1.15.1
  • Use medik8s/common API for remediation cannot start event

Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com>
Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com>
Copy link
Contributor

openshift-ci bot commented Feb 9, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Feb 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clobrano

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Feb 9, 2024
verifyConditionUnset(commonconditions.PermanentNodeDeletionExpectedType)
verifyEvents([]expectedEvent{
{v1.EventTypeWarning, "RemediationSkippedNodeNotFound", "failed to fetch node", true},
{v1.EventTypeWarning, "RemediationCannotStart", "Could not get remediation target Node", true},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you mean to switch RemediationSkippedNodeNotFound to remediationCannotStartNodeNotFound instead of RemediationCannotStart?

Suggested change
{v1.EventTypeWarning, "RemediationCannotStart", "Could not get remediation target Node", true},
{v1.EventTypeWarning, remediationCannotStartNodeNotFound, "Could not get remediation target Node", true},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from common, where the reason is RemediationCannotStart

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe we should update Common first.
It seems weird that there areremediationCannotStartNoControllerOwner, and RemediationCannotStartMachineNotFound event reasons, which are straightforward, and instead of the old RemediationSkippedNodeNotFound reason we use generic RemediationCannotStart instead of remediationCannotStartNodeNotFound. All Medik8s remediator operators work with nodes, and this generic event is meant for when the node is missing and then remediation cannot start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use generic RemediationCannotStart

I think it's good that is generic because it's in common 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why be generic if being more specific still fits everyone? The only downside for this is if we change the CRs API and logic then we might not have this kind of event. If so, then a new event should be created. That seems ok to me.

I think it would make sense to use remediationCannotStartNodeNotFound in common, so MDR, FAR, SNR (and NHC) could use the same event which would be straightforward, and with a better meaning IMO. I don't see another RemediationCannotStart kind of scenario that fits all the operators, if there is one that could fit some of them and we want to use this event for this use case then it would be fine. But having a more concrete event reason still seems better to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still the message "Could not get remediation target Node" to clarify.
I'd say, I could use the same reason "remediationSkipped" for the other MDR's event and then clarify either "missing controller owner" or "missing target node"

Copy link
Contributor Author

@clobrano clobrano Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common repo could instead get provide it's reasons as variables so that the operators can reuse it (e.g. common.RemediationCannotStart)

// Cluster provider is not set in this test
{commonconditions.PermanentNodeDeletionExpectedType, metav1.ConditionUnknown, v1alpha1.MachineDeletionOnUndefinedProviderReason}})
verifyEvents([]expectedEvent{
{v1.EventTypeWarning, "RemediationSkippedNoControllerOwner", noControllerOwnerErrorMsg, true},
{v1.EventTypeWarning, "RemediationCannotStartNoControllerOwner", noControllerOwnerErrorMsg, true},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use the variable (remediationCannotStartNoControllerOwner)?

Suggested change
{v1.EventTypeWarning, "RemediationCannotStartNoControllerOwner", noControllerOwnerErrorMsg, true},
{v1.EventTypeWarning, remediationCannotStartNoControllerOwner", noControllerOwnerErrorMsg, true},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea, I just need to make it a string though 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to just use the variable. Isn't it enough? 🤔
FYI there are more occurrences of "RemediationCannotStartNoControllerOwner" that could be modified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to just use the variable. Isn't it enough? 🤔

the variable is not a string, or did I misunderstand the question?

FYI there are more occurrences of "RemediationCannotStartNoControllerOwner" that could be modified**

Thanks, I'll update the other after the other thread then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do need casting for string. remediationCannotStartNoControllerOwner is of type conditionChangeReason and I though it could be used as string for this matter, but this is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com>
@clobrano
Copy link
Contributor Author

/test 4.15-openshift-e2e

@clobrano clobrano requested a review from razo7 May 28, 2024 12:02
go.mod Outdated
@@ -4,7 +4,7 @@ go 1.21

require (
github.com/go-logr/logr v1.4.1
github.com/medik8s/common v1.13.0
github.com/medik8s/common v1.15.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: There is already a newer version of common https://github.com/medik8s/common/tags that we can use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants