-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of Optional Parameter - allow_Red_cluster to bypass ISM to run on a Red Opensearch Cluster #1189
Open
skumarp7
wants to merge
6
commits into
opensearch-project:main
Choose a base branch
from
nokia:ismRedClusterFeature
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Addition of Optional Parameter - allow_Red_cluster to bypass ISM to run on a Red Opensearch Cluster #1189
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
3ed3edb
Addition of Optional Parameter - allow_Red_cluster to bypass ISM to r…
skumarp7 f19c0b5
Indendation Fix
skumarp7 8fe97f1
Merge branch 'opensearch-project:main' into ismRedClusterFeature
skumarp7 c7adb0b
Overriding state with default values of allowredcluster
skumarp7 0977318
Fix missing brackets
skumarp7 ada089e
Addition of test case
skumarp7 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,11 @@ | |
|
||
package org.opensearch.indexmanagement.indexstatemanagement.runner | ||
|
||
import org.apache.hc.core5.http.ContentType | ||
import org.apache.hc.core5.http.io.entity.StringEntity | ||
import org.opensearch.client.Request | ||
import org.opensearch.client.Response | ||
import org.opensearch.core.rest.RestStatus | ||
import org.opensearch.indexmanagement.indexstatemanagement.ISMActionsParser | ||
import org.opensearch.indexmanagement.indexstatemanagement.IndexStateManagementRestTestCase | ||
import org.opensearch.indexmanagement.indexstatemanagement.action.OpenAction | ||
|
@@ -224,4 +229,101 @@ class ManagedIndexRunnerIT : IndexStateManagementRestTestCase() { | |
assertEquals("Failed to update ManagedIndexConfig jitter", newJitter, currJitter) | ||
} | ||
} | ||
|
||
fun `test runner on a red cluster with allow_red_cluster as false`() { | ||
val indexName = "test-index-1" | ||
val policyID = "test_red_cluster_policy" | ||
val policy = | ||
""" | ||
{"policy":{"description":"Close indices older than 5m","default_state":"active","states":[{"name":"active","allow_red_cluster":"false", | ||
"actions":[],"transitions":[{"state_name":"inactivestate","conditions":{"min_index_age":"5s"}}]},{"name":"inactivestate","allow_red_cluster":"false" | ||
,"actions":[{"delete":{}}],"transitions":[]}],"ism_template":{"index_patterns":["test-index"]}}} | ||
""".trimIndent() | ||
createPolicyJson(policy, policyID) | ||
createIndex(indexName, policyID) | ||
waitFor { assertIndexExists(indexName) } | ||
|
||
val managedIndexConfig = getExistingManagedIndexConfig(indexName) | ||
|
||
val endpoint = "sim-red" | ||
val jsonEntity = """{"settings":{"index.routing.allocation.require.size": "test"}}""" | ||
val request = Request("PUT", endpoint) | ||
request.entity = StringEntity(jsonEntity, ContentType.APPLICATION_JSON) | ||
val response: Response = client().performRequest(request) | ||
assertEquals("Failed to simulate red cluster", RestStatus.OK, response.restStatus()) | ||
|
||
// Change the start time so the job will trigger in 2 seconds. | ||
// After the job, the index will be in "Active" State | ||
updateManagedIndexConfigStartTime(managedIndexConfig) | ||
|
||
waitFor { assertEquals(policyID, getExplainManagedIndexMetaData(indexName).policyID) } | ||
|
||
// Change the start time so the job will trigger in 2 seconds. | ||
// Index Transitions to inactivestate state | ||
updateManagedIndexConfigStartTime(managedIndexConfig) | ||
|
||
// Wait for the index to settle in "inactivestate". | ||
Thread.sleep(8000L) | ||
|
||
// Change the start time so the job will trigger in 2 seconds. | ||
updateManagedIndexConfigStartTime(managedIndexConfig) | ||
|
||
Thread.sleep(5000L) | ||
|
||
waitFor { assertIndexExists(indexName) } | ||
|
||
val deleteReq = Request("DELETE", endpoint) | ||
val deleteRes: Response = client().performRequest(deleteReq) | ||
assertEquals("Failed to delete Index $endpoint", RestStatus.OK, deleteRes.restStatus()) | ||
isClusterGreen("30s") | ||
} | ||
|
||
fun `test runner on a red cluster with allow_red_cluster as true`() { | ||
val indexName = "test-index-2" | ||
val policyID = "test_red_cluster_policy" | ||
val policy = | ||
""" | ||
{"policy":{"description":"Close indices older than 5m","default_state":"active","states":[{"name":"active","allow_red_cluster":"true", | ||
"actions":[],"transitions":[{"state_name":"inactivestate","conditions":{"min_index_age":"5s"}}]},{"name":"inactivestate","allow_red_cluster":"true" | ||
,"actions":[{"delete":{}}],"transitions":[]}],"ism_template":{"index_patterns":["test-index"]}}} | ||
""".trimIndent() | ||
createPolicyJson(policy, policyID) | ||
createIndex(indexName, policyID) | ||
waitFor { assertIndexExists(indexName) } | ||
|
||
val managedIndexConfig = getExistingManagedIndexConfig(indexName) | ||
|
||
val endpoint = "sim-red" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to not use a different index, but just use |
||
val jsonEntity = """{"settings":{"index.routing.allocation.require.size": "test"}}""" | ||
val request = Request("PUT", endpoint) | ||
request.entity = StringEntity(jsonEntity, ContentType.APPLICATION_JSON) | ||
val response: Response = client().performRequest(request) | ||
assertEquals("Failed to simulate red cluster", RestStatus.OK, response.restStatus()) | ||
|
||
// Change the start time so the job will trigger in 2 seconds. | ||
// After the job, the index will be in "Active" State | ||
updateManagedIndexConfigStartTime(managedIndexConfig) | ||
|
||
waitFor { assertEquals(policyID, getExplainManagedIndexMetaData(indexName).policyID) } | ||
|
||
// Change the start time so the job will trigger in 2 seconds. | ||
// Index Transitions to inactivestate state | ||
updateManagedIndexConfigStartTime(managedIndexConfig) | ||
|
||
// Wait for the index to settle in "inactivestate". | ||
Thread.sleep(8000L) | ||
|
||
// Change the start time so the job will trigger in 2 seconds. | ||
updateManagedIndexConfigStartTime(managedIndexConfig) | ||
|
||
// Wait for the index deletion by the ISM job. | ||
Thread.sleep(5000L) | ||
|
||
waitFor { assertIndexDoesNotExist(indexName) } | ||
|
||
val deleteReq = Request("DELETE", endpoint) | ||
val deleteRes: Response = client().performRequest(deleteReq) | ||
assertEquals("Failed to delete Index $endpoint", RestStatus.OK, deleteRes.restStatus()) | ||
isClusterGreen("30s") | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't move this red cluster check down. Because it helps to not making 1 or more calls below which potentially worsen a red cluster.
Instead, we would need to rely on the ManagedIndexConfig which we directly have here. It has a snapshot of the policy. Only when the policy has
allow_red_cluster
, then we will bypass this check.The problem is ManagedIndexConfig doesn't know which state or action is running, only metadata knows, so we won't be able to just bypass when specific state or action is running. To do that, more complex logic would be needed to update ManagedIndexConfig with required info...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bowenlan-amzn,
The calls we're making to the cluster are mostly GET operations, to fetch state etc, so I'm not sure how they would worsen a red cluster. I can still try to avoid them, but not sure of the right approach --
This is my understanding of your suggestions. Please correct where required :
(At this stage, we don't know for which state this is going to be executed, as we cannot run any get calls).
Challenges:
That means, for a state that didn't have allow_red_cluster true, we would still execute those 1-2 GET calls before skipping it - which again could potentially worsen the cluster.
Ques:
That would mean run all state/actions (if flag is true), or run nothing (if flag is false).
But with this, the concern rises that - any ism action that doesnt support red cluster or harm a red cluster will also be bypassed to run on a red cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, these are get calls, but consider this small change will affect all the ISM jobs which can be 10k+ in a cluster, we should still keep the original logic — don't do anything if cluster is red.
We don't need new logic in ManagedIndexConfig. ManagedIndexConfig has policy in it
index-management/src/main/kotlin/org/opensearch/indexmanagement/indexstatemanagement/model/ManagedIndexConfig.kt
Line 38 in dbd2bc2
I'm thinking a little hacky solution maybe having sth like
allow_red_cluster_after: 3d
, then compare the current time withjobEnabledTime
which seems to be the time when the job is created. So we can calculate when to start to allow red cluster, before then, we still block execution by red cluster.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bowenlan-amzn,
I am guessing the idea here being suggested is we give a time-window to the user to decide - such that if the cluster continues to remain in red state for that long, then ism would be allowed to run.
Few follow up questions:
Let's say that the policy has triggered the job on day 1 and the cluster state goes to red after 10 days. The job would immediately get triggered on the red-cluster as the difference between the
jobEnabledTime
and current time would already have exceeded 3 days. In this case, the red cluster will be bypassed regardless of the time duration of 3days set in the parameter and it would not wait in red state for 3d. Is this understanding correct?Is there any way to retrieve information for how many days the cluster state has been in Red?
Just thinking out loud on whether we could instead check the duration of how much time the cluster state has been red and compare that with
allow_red_cluster_after: 3d
.