Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [REMOTE STORE] Unable to index data if region key not provided in node attribute #9942

Open
rishabh6788 opened this issue Sep 8, 2023 · 13 comments
Labels
bug Something isn't working v2.11.0 Issues and PRs related to version 2.11.0

Comments

@rishabh6788
Copy link
Contributor

rishabh6788 commented Sep 8, 2023

Describe the bug
Following changes from #9105 created a 2.10.0 OS cluster with remote-store settings mentioned below:

node.attr.remote_store.segment.repository: opensearch-infra-stack-test-new-setting-v4-repo
node.attr.remote_store.repository.opensearch-infra-stack-test-new-setting-v4-repo.type: s3
node.attr.remote_store.repository.opensearch-infra-stack-test-new-setting-v4-repo.settings:
  bucket : opensearch-infra-stack-test-new-setting-v4
  base_path: remote-store
node.attr.remote_store.translog.repository: opensearch-infra-stack-test-new-setting-v4-repo
node.attr.remote_store.state.repository: opensearch-infra-stack-test-new-setting-v4-repo

While running benchmark against this cluster I am seeing below failures in the opensearch log:

[2023-09-08T18:39:02,831][INFO ][o.o.r.s.S3BlobContainer  ] [ip-10-0-5-34.ec2.internal] exception error from blob container for file translog-15.tlog
[2023-09-08T18:39:02,831][ERROR][o.o.i.t.t.BlobStoreTransferService] [ip-10-0-5-34.ec2.internal] Failed to upload blob translog-15.tlog
java.io.IOException: java.lang.IllegalArgumentException: region must not be blank or empty.
        at org.opensearch.repositories.s3.S3BlobContainer.asyncBlobUpload(S3BlobContainer.java:211) ~[?:?]
        at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlob(BlobStoreTransferService.java:147) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.index.translog.transfer.BlobStoreTransferService.lambda$uploadBlobs$2(BlobStoreTransferService.java:99) [opensearch-2.10.0.jar:2.10.0]
        at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
        at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlobs(BlobStoreTransferService.java:94) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.index.translog.transfer.TranslogTransferManager.transferSnapshot(TranslogTransferManager.java:152) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.index.translog.RemoteFsTranslog.upload(RemoteFsTranslog.java:321) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.index.translog.RemoteFsTranslog.prepareAndUpload(RemoteFsTranslog.java:294) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.index.translog.RemoteFsTranslog.ensureSynced(RemoteFsTranslog.java:244) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.index.translog.Translog.ensureSynced(Translog.java:835) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.index.translog.InternalTranslogManager.ensureTranslogSynced(InternalTranslogManager.java:178) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.index.engine.InternalEngine.ensureTranslogSynced(InternalEngine.java:605) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.index.shard.IndexShard.lambda$createTranslogSyncProcessor$44(IndexShard.java:4166) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.index.shard.IndexShard$6.write(IndexShard.java:4180) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:129) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:117) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.common.util.concurrent.BufferedAsyncIOProcessor.process(BufferedAsyncIOProcessor.java:80) [opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [opensearch-2.10.0.jar:2.10.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.lang.IllegalArgumentException: region must not be blank or empty.

However when added region key in the node.attr.remote_store.repository.opensearch-infra-stack-test-new-setting-v4-repo.settings: setting it is working fine and I am able to index the data without any issue.

node.attr.remote_store.repository.opensearch-infra-stack-test-new-setting-v4-repo.settings:
  bucket : opensearch-infra-stack-test-new-setting-v4
  base_path: remote-store
  region: us-east-1

There should be a default for region if not provided explicitly in the node attribute.

To Reproduce
Steps to reproduce the behavior:

  1. Create an OpenSearch cluster with above mentioned remote-store settings.
  2. Try to index data into it manually or using opensearch-benchmark.
  3. Check cluster logs.
  4. See error

Expected behavior
There should be a default for region if not provided explicitly in the node attribute and should not throw error.

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: AL2
  • Version 2.10.0

Additional context
Add any other context about the problem here.

@rishabh6788 rishabh6788 added bug Something isn't working untriaged labels Sep 8, 2023
@rishabh6788
Copy link
Contributor Author

@sachinpkale
Copy link
Member

Hi @rishabh6788, this is not specific to remote store. To enable remote store feature, you need to provide node attributes that will help create the repository. repository-s3 plugin requires region to be provide while creating the repository.

If we create a repository to take snapshot, then also region needs to be provided, so the requirement is not from remote store feature but repository creation.

Instead of providing region in s3 repository creation, you can also set default region by adding following in the yml file:

s3.client.default.region: us-east-1

@sachinpkale
Copy link
Member

sachinpkale commented Sep 9, 2023

Also the settings have not been updated in https://github.com/opensearch-project/OpenSearch/blob/2.x/distribution/src/config/opensearch.yml#L90-L100

Repo settings are missing, we will get them added. Thanks for reporting it.

@rishabh6788
Copy link
Contributor Author

Thanks for the response @sachinpkale.
I was under the impression that to register the repository using repository-s3 plugin, bucket is the only mandatory parameter and rest are optional and it seems to have worked since with the setting I copied from #9105 also did not have region specified in the node attribute and the repository was actually created without specifying the region key and the cluster with remote-store feature came up healthy.
The sdk should be able to identify the region from the scope of the credentials it uses or the metadata, is this not the case?

Also since the exception Failed to upload blob translog-15.tlog came from here I guessed it has to do something with core and not the repository-s3 plugin.

If I understand you correctly, in case a user wants to use remote-store they need to either add region key in the node.attr.remote_store.repository.<s3-repo>.settings or use s3.client.default.region: us-east-1 to set the default.
Please correct me if I am wrong.

@rishabh6788
Copy link
Contributor Author

As per my discussion with @sachinpkale region is a required parameter when using s3 buckets from any other region except us-west-2. Closing this issue.

@tlfeng
Copy link
Collaborator

tlfeng commented Sep 12, 2023

it seems to have worked since with the setting I copied from #9105 also did not have region specified in the node attribute and the repository was actually created without specifying the region key and the cluster with remote-store feature came up healthy.

@rishabh6788 @sachinpkale I still have questions on the region parameter.
I validated that OpenSearch cluster can be deployed correctly after your change on Sept.6th to apply node.attr.remote_store.repository setting change in commit opensearch-project/opensearch-cluster-cdk@de2874c. While a following deployment failed on Sept.7th night.

According to the doc of S3 repository plugin about region parameter (https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-s3-client.html),

Generally, the SDK will correctly guess the signing region to use. It should be considered an expert level setting to support S3-compatible APIs that require v4 signatures and use a region other than the default us-east-1.

I think region parameter is not required ever before, so I still didn't understand region is needed after the recent code change.

I knew that endpoint is needed when connecting to S3 bucket which not in us-east-1 within 24 hours of created. For example, connecting to a bucket in us-west-2, I added this setting in yml file "s3.client.default.endpoint": "s3.us-west-2.amazonaws.com".

@tlfeng tlfeng reopened this Sep 12, 2023
@msfroh
Copy link
Collaborator

msfroh commented Sep 12, 2023

As per my discussion with @sachinpkale region is a required parameter when using s3 buckets from any other region except us-west-2.

Where would us-west-2 come from as a default? It wasn't the original AWS region (that was us-east-1).

The SDK should be able to use DefaultAwsRegionProviderChain to infer the region, especially if running on EC2.

@rishabh6788
Copy link
Contributor Author

rishabh6788 commented Sep 12, 2023

If I understand correctly the repository registration is now handled by the OS process and not the repository-s3 plugin in remote-store enabled cluster. See https://github.com/opensearch-project/OpenSearch/pull/9105/files#diff-4f5019ebdeec5ac0e4908c525fa2d4d4276eb0bdeddc4870c5b406a49df2e869

The default region dependency was removed from repository-s3 in #7989 and it is not required while using repo-s3 plugin to register a repo in any region.

@tlfeng
Copy link
Collaborator

tlfeng commented Sep 13, 2023

I found that there is no default region for node.attr.remote_store.repository settings.
Because I tested to deploy OpenSearch cluster in both us-east-1 and us-west-2 region using CDK script without specifying region node attribute (without this commit opensearch-project/opensearch-cluster-cdk@51a42aa), they both having the warning java.lang.IllegalArgumentException: region must not be blank or empty. during indexing.

It also means that region has became a mandatory setting for using remote store feature.

I think there are 2 defects:

  1. region of the remote storage should be identified automatically. As @msfroh mentioned above, since the SDK has the ability to infer the region by using DefaultAwsRegionProviderChain, and the cluster is hosted in EC2 instances, the region doesn't need to be specified.
  2. When region can not be identified, It should be an "error" instead of a "warning" in the log, because nothing will be uploaded into the remote storage (S3 bucket), which makes indexing failed.

@tlfeng
Copy link
Collaborator

tlfeng commented Sep 13, 2023

@sachinpkale confirmed that the repository registration flow haven't been changed as part of remote store feature. The same behavior would exist as trying to register a repository for snapshot.

I just realized previously the region is also specified in CDK script, while it was just moved to "node.attributes" setting from calling the REST API in commit opensearch-project/opensearch-cluster-cdk@29f9a23. Indeed this is not related to remote store feature.

@rishabh6788 I think maybe next step is to verify whether region truly needs to be specified before the above commit. According to OpenSearch docs (https://opensearch.org/docs/2.9/tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore#amazon-s3), region is still an optional parameter in version 2.9.

@rishabh6788
Copy link
Contributor Author

In general if you just want to verify if we can register a snapshot repo without region key and then upload snapshots to it, then we can test it out.
I had added region in the code before just to be safe.
I will verify and confirm back here.

@rishabh6788
Copy link
Contributor Author

rishabh6788 commented Sep 13, 2023

@sachinpkale @tlfeng I created a 2.10.0 single node cluster with no feature, just plain docrep in eu-west-1 region.
Installed repository-s3 plugin and registered a snapshot repo using below command:

curl -X PUT "localhost:9200/_snapshot/custom_s3_repository?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "test-regular-snapshot",
    "base_path": "regular-path"
  }
}
'

I already have test-regular-snapshot bucket created in the same region.
I then took a snapshot and uploaded using below command:

curl -XPUT "http://localhost:9200/_snapshot/custom_s3_repository/1"
{"accepted":true}

I confirmed that the snapshot was uploaded successfully as well.

[2023-09-13T18:39:00,009][INFO ][o.o.s.SnapshotsService   ] [ip-10-0-3-52.eu-west-1.compute.internal] snapshot [custom_s3_repository:1/FbfDfYvFTdKpt0_ShnN5YQ
] completed with state [SUCCESS]

As you can see that repository-s3 plugin doesn't need region key explicitly provided to figure out which region we are working in as long as the s3 bucket is in the same region as the ec2 instance on which the OS process is running.

If it is mandatory parameter to make remote-store work then we need to call this out and add in the documentation till we figure out why is it not working as expected with remote-store enabled.

@rishabh6788
Copy link
Contributor Author

rishabh6788 commented Sep 13, 2023

The issue is not with respository creation, the issue is with data upload part while using remote-store feature.
The repo is getting registered successfully but when it comes to data upload part it is failing. @sachinpkale

@sachinpkale sachinpkale added v2.11.0 Issues and PRs related to version 2.11.0 and removed v2.10.0 labels Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working v2.11.0 Issues and PRs related to version 2.11.0
Projects
None yet
Development

No branches or pull requests

5 participants