-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix test_change_client_ocs_version_and_stop_heartbeat test #9395
Conversation
fbalak
commented
Feb 29, 2024
•
edited
Loading
edited
- Fixes Fix alert message for StorageClientHeartbeatMissed #9394
- Fixes test_change_client_ocs_version_and_stop_heartbeat test case
- Updates check_alert_list function to correctly check alerts with same label but different message
Signed-off-by: fbalak <fbalak@redhat.com>
Signed-off-by: fbalak <fbalak@redhat.com>
Signed-off-by: fbalak <fbalak@redhat.com>
Signed-off-by: fbalak <fbalak@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -1177,7 +1177,7 @@ def change_client_version(): | |||
nonlocal client | |||
nonlocal original_cluster | |||
# run_time of operation | |||
run_time = 60 * 3 | |||
run_time = 60 * 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fbalak why is this value set at 60*7 ? Could you please clarify to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StorageClientHeartbeatMissed - description has 120(s) for Warning and 300(s) for Critical.
Expressions in Prometheus rules confirming. Waring expression for reference.
[(time() - 120) > (ocs_storage_client_last_heartbeat > 0)](https://console-openshift-console.apps.ibm-baremetal1.qe.rh-ocs.com/monitoring/query-browser?query0=(time()%20-%20120)%20%3E%20(ocs_storage_client_last_heartbeat%20%3E%200))
StorageClientIncompatibleOperatorVersion immediate as far as I understand. No interval in the description.
So all together should be 420sec enough, which is the same as Filip set here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: fbalak <fbalak@redhat.com>
Signed-off-by: fbalak <fbalak@redhat.com>
Signed-off-by: fbalak <fbalak@redhat.com>
Signed-off-by: fbalak <fbalak@redhat.com>
Signed-off-by: fbalak <fbalak@redhat.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dahorak, DanielOsypenko, ebondare, fbalak The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…torage#9395) * update message of StorageClientHeartbeatMissed alert Signed-off-by: fbalak <fbalak@redhat.com> * remove dot from the alert message Signed-off-by: fbalak <fbalak@redhat.com> * update alert data Signed-off-by: fbalak <fbalak@redhat.com> * increase alert collecting time Signed-off-by: fbalak <fbalak@redhat.com> * update alert messages Signed-off-by: fbalak <fbalak@redhat.com> * update check_alert_list to reflect multiple messages for one alert Signed-off-by: fbalak <fbalak@redhat.com> * specify namespace in patch command Signed-off-by: fbalak <fbalak@redhat.com> * fix alert dictionary keys Signed-off-by: fbalak <fbalak@redhat.com> * fix severity level Signed-off-by: fbalak <fbalak@redhat.com> --------- Signed-off-by: fbalak <fbalak@redhat.com>