Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with segment replication #407

Closed
Tracked by #8211
dreamer-89 opened this issue Jun 29, 2023 · 15 comments
Closed
Tracked by #8211

Compatibility with segment replication #407

dreamer-89 opened this issue Jun 29, 2023 · 15 comments
Labels
enhancement New feature or request untriaged v2.9.0 v2.9.0

Comments

@dreamer-89
Copy link
Member

dreamer-89 commented Jun 29, 2023

Summary

With 2.9.0 release, there are lot of enhancements going in for segment replication[1][2] feature (went GA in 2.7.0), we need to ensure different plugins are compatible with current state of this feature. Previously, we ran tests on plugin repos to verify this compatibility but want plugin owners to be aware of these changes so that required updates (if any) can be made. With 2.10.0 release, remote store feature is going GA which internally uses SEGMENT replication strategy only i.e. it enforces all indices to use SEGMENT replication strategy. So, it is important to validate plugins are compatible with segment replication feature.

What changed

1. Refresh policy behavior

  1. RefreshPolicy.IMMEDIATE will only refresh primary shards but not replica shards immediately. Instead post refresh, primary will start a round of segment replication to update the replica shard copies leading to eventual consistency.
  2. RefreshPolicy.WAIT_UNTIL ensures the indexing operation is searchable in your cluster i.e. RAW (Read after write guarantee). With segment replication, this guarantee is not promised due to delay in replica shared updates from asynchronous background refreshes.

2. Refresh lag on replicas

With segment replication, there is inherent delay in documents to be searchable on replica shard copies. This is due to the fact that replica shard copies over data (segment) files from primary. Thus, compared to document replication, there will be on average increase in amount of time the replica shards are consistent with primaries.

3. System/hidden indices support

With opensearch-project/OpenSearch#8200, system and hidden indices are now supported with SEGMENT replication strategy. We need to ensure there are no bottlenecks which prevents system/hidden indices with segment replication.

Next steps

With segment replication strong reads are not guaranteed. Thus, if the plugin needs strong reads guarantees specially as alternative to change in behavior of refresh policy and lag on replicas (point 1 and 2 above), we need to update search requests to target primary shard only. With opensearch-project/OpenSearch#7375, core now supports primary shards only based search. Please follow documentation for examples and details

Open questions

In case of any questions or issues, please post it in core issue

Reference

[1] Design

[2] Documentation

@dreamer-89
Copy link
Member Author

Request owners to add v2.9.0 label on this issue.

@gaiksaya gaiksaya added the v2.9.0 v2.9.0 label Jul 3, 2023
@owaiskazi19
Copy link
Member

owaiskazi19 commented Jul 5, 2023

-> First we need to check if we use RefreshPolicy.IMMEDIATE or RefreshPolicy.WAIT_UNTIL anywhere in any of our job-scheduler tests? If Yes we need to check if they pass with segment replication.
-> Next check if any of the existing tests fail with segment replication? If they fail, we need to verify if they because replica not caught up to primary? If yes we need to add some wait condition until replica has caught up to primary.
-> Next make sure that we have some tests in job-scheduler repo which create system indices. We need to verify that if system/hidden indices in job scheduler work correctly with segment replication. Before verifying this step make sure this PR: opensearch-project/OpenSearch#8200 is merged. (will be merged soon).
To enable segment replication all you need to do is pass cluster.indices.replication.strategy: 'SEGMENT' in opensearch.yml file. If you are building core tarball locally, I suggest to change this value to ReplicationType.SEGMENT and this value to SEGMENT. Although this is a hack but if you are building core tarball locally this will be simpler.
cc: @Rishikesh1159

@dbwiddis
Copy link
Member

dbwiddis commented Jul 6, 2023

The only use of any RefreshPolicy in this repo is in the Sample Extension Plugin, in a sample Rest Handler:

That REST API is tested here, but it doesn't rely on checking replicas:

protected SampleJobParameter createWatcherJobWithClient(RestClient client, String jobId, SampleJobParameter jobParameter)
throws IOException {
Map<String, String> params = getJobParameterAsMap(jobId, jobParameter);
Response response = makeRequest(client, "POST", SampleExtensionRestHandler.WATCH_INDEX_URI, params, null);
Assert.assertEquals("Unable to create a watcher job", RestStatus.OK, RestStatus.fromCode(response.getStatusLine().getStatusCode()));
Map<String, Object> responseJson = JsonXContent.jsonXContent.createParser(
NamedXContentRegistry.EMPTY,
LoggingDeprecationHandler.INSTANCE,
response.getEntity().getContent()
).map();
return getJobParameter(client, responseJson.get("_id").toString());
}

So I wouldn't expect any failures in the first two steps.

@minalsha
Copy link

minalsha commented Jul 6, 2023

#407 (comment) : "Before verifying this step make sure this PR: opensearch-project/OpenSearch#8200 is merged. (will be merged soon)." @Rishikesh1159 this PR #8200 is not merged yet and today being 7/6. Can you please get to having this PR merged today? cc @anasalkouz

@dbwiddis
Copy link
Member

dbwiddis commented Jul 6, 2023

Looks like many of the tests in LockServiceIT test updates to the system index for creating or deleting locks, in particular a multi-threaded test testMultiThreadCreateLock() that would likely hit multiple nodes and could fail.

Given the need for these locks to be replicated to be of any use, I'm expecting that this test likely would fail (at least randomly some of the time) and the action needed on this issue is to change the lock acquiring code in LockService to require RefreshPolicy.WAIT_UNTIL.

@dbwiddis
Copy link
Member

dbwiddis commented Jul 6, 2023

However, the above strategy could be a performance impact, particularly for plugins that use the lock service (I'm thinking Real Time HCAD, @kaituo ):

This could impact performance, though:

However there are perf implications here that need to be called out. SegRep will on average increase the amount of time that replica shards are inconsistent from their primaries due to the time required to copy out segments (time dependent on cluster config/hardware etc). System indices are generally small and should copy out quickly, so I wouldn't expect this to be of major concern. Are there are cases where stronger read/write consistency is desired on the system index? If this is the case plugins would need to prioritize primaries for searches.

Originally posted by @mch2 in opensearch-project/OpenSearch#8182 (comment)

@minalsha
Copy link

minalsha commented Jul 6, 2023

@saratvemulapalli / @joshpalis / @owaiskazi19 can you please see comment: #407 (comment) and share your feedback if the approach is right way to go about? @kaituo please do share your thoughts as well since this would have performance impact.

@joshpalis
Copy link
Member

Given the need for these locks to be replicated to be of any use, I'm expecting that this test likely would fail (at least randomly some of the time) and the action needed on this issue is to change the lock acquiring code in LockService to require RefreshPolicy.WAIT_UNTIL.

I agree with this approach, but I am doubtful that the performance impact would be significant as the number of documents that would be indexed into this would be relatively small. A lock (document) is indexed prior to the start of a job execution and is the sole lock used for this job (and all subsequent executions of the job).

@Rishikesh1159
Copy link
Member

Rishikesh1159 commented Jul 6, 2023

-> First we need to check if we use RefreshPolicy.IMMEDIATE or RefreshPolicy.WAIT_UNTIL anywhere in any of our job-scheduler tests? If Yes we need to check if they pass with segment replication. -> Next check if any of the existing tests fail with segment replication? If they fail, we need to verify if they because replica not caught up to primary? If yes we need to add some wait condition until replica has caught up to primary. -> Next make sure that we have some tests in job-scheduler repo which create system indices. We need to verify that if system/hidden indices in job scheduler work correctly with segment replication. Before verifying this step make sure this PR: opensearch-project/OpenSearch#8200 is merged. (will be merged soon). To enable segment replication all you need to do is pass cluster.indices.replication.strategy: 'SEGMENT' in opensearch.yml file. If you are building core tarball locally, I suggest to change this value to ReplicationType.SEGMENT and this value to SEGMENT. Although this is a hack but if you are building core tarball locally this will be simpler. cc: @Rishikesh1159

sorry there is some correction needed on above comment. To enable segment replication if you are building opensearch core tarball locally and use it with plugin then you need to change this in IndicesService to ReplicationType.SEGMENT, instead I incorrectly pointed you to change in IndexMetadata which is not needed. (Again this is hack to build tarball with segrep enabled quickly. The correct way to enable segrep is still by passing cluster.indices.replication.strategy: 'SEGMENT' in opensearch.yml file)

@dbwiddis
Copy link
Member

dbwiddis commented Jul 6, 2023

Hey @Rishikesh1159 thanks for the clarity.

I would like for Job Scheduler lock index (system index), to always choose DOCUMENT, overriding any cluster settings if there are any, as discussed in opensearch-project/OpenSearch#8211 (comment)

If so I think this task boils down to:

  • run tests with cluster set to SEGMENT and see if they fail (they probably will for the lock system index)
  • change the code for that system index to always set it to DOCUMENT
  • run tests again and see if they pass

I believe this meets the intent of this issue and also preserves the utility of a performance-sensitive "lock" index having replica consistency as soon as possible.

@saratvemulapalli
Copy link
Member

saratvemulapalli commented Jul 7, 2023

Problem

Searching through code for all IndexRequests[1], there are 3 different use cases:

  • LockService[2]
  • Downstream Plugins like AD, ISM etc [3]
  • JobDetailsService[4]

None of Job Scheduler code base uses RefreshPolicy as Dan already called out in #407 (comment).
RefreshPolicy when undefined is set to default as false[5] which essentially means refresh is done at a default interval at some point after the response is sent back.

JobDetailsService is only used to populate the job details index once in a lifetime of extension (newly added feature) and does not really impact any downstream plugins.

LockService will be tricky, since its used by downstream plugins to acquire a lock. The way these jobs are triggered its unlikely that a lock will be acquired within the default refresh interval but we should stress test it. Anyway as @joshpalis said, the number of documents being ingested are tiny in production and I believe we'll not see any impact for JS.

@dreamer-89 @Rishikesh1159 let me know if we missed anything else.

Next Steps

  • As @dbwiddis suggested, stress test LockService with ITs but it should be ok for us to use SegRep as searches are randomized and limited to number of workers = number of nodes in the cluster with JS installed.
  • Reach out to downstream plugins and make sure they add integration tests with SegRep for Job indices they configure.

Refs

[1] https://github.com/search?q=repo%3Aopensearch-project%2Fjob-scheduler%20IndexRequest&type=code
[2] https://github.com/opensearch-project/job-scheduler/blob/main/spi/src/main/java/org/opensearch/jobscheduler/spi/utils/LockService.java
[3] https://github.com/search?q=org%3Aopensearch-project%20org.opensearch%3Aopensearch-job-scheduler-spi&type=code
[4] https://github.com/opensearch-project/job-scheduler/blob/main/src/main/java/org/opensearch/jobscheduler/utils/JobDetailsService.java#L24
[5] https://opensearch.org/docs/latest/api-reference/document-apis/index-document/#url-parameters

@dbwiddis
Copy link
Member

dbwiddis commented Jul 7, 2023

LockService will be tricky, since its used by downstream plugins to acquire a lock. The way these jobs are triggered its unlikely that a lock will be acquired within the default refresh interval but we should stress test it.

Just checked with @mch2 and because we use "Get by ID" that presently assures us the consistency we need, but as I understand from him, may not yet provide those guarantees under SEGREP. I believe he'll be posting a comment here shortly with more details.

@dbwiddis
Copy link
Member

dbwiddis commented Jul 9, 2023

Comment posted in linked core issue: opensearch-project/OpenSearch#8211 (comment)

My plan to address this issue is:

@dbwiddis
Copy link
Member

Added an Integ test to try to get the same lock and tried many different times to get it to fail in SegRep, without success (well, without failure, but you know what I mean.)

I still defensively added the DOCUMENT index config just in case.

So I think action is complete here.

@dbwiddis
Copy link
Member

Completed in #417

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request untriaged v2.9.0 v2.9.0
Projects
None yet
Development

No branches or pull requests

8 participants