Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index Level Encryption plugin #12902

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

asonje
Copy link

@asonje asonje commented Mar 25, 2024

Description

This pull request adds index level encryption features to OpenSearch based on the issue #3469. Each OpenSearch index is individually encrypted based on user provided encryption keys. A new cryptofs store type index.store.type is introduced which instantiates a CryptoDirectory that encrypts and decrypts files as they are written and read respectively

Related Issues

Resolves #3469

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
@kumargu
Copy link
Contributor

kumargu commented Sep 23, 2024

Hey @bruno-roustant ,

I was reading https://github.com/apache/solr-sandbox/blob/main/ENCRYPTION.md. I have following questions if you could please help answer.

This solution provides the encryption of the Lucene index files at the Java level.

I believe there no support for OS level anywhere in the solr-sandbox?

It stores the id of the encryption key in the commit metadata (and obviously the key secret is never stored).

What does "commit" metadata mean? How is this metadata managed when keys are added, deleted and rotated. Sorry, i haven't read the code pointers you have pointed, but would be helpful if you can give a high level overview.

Java-level encryption can be used when the OS-level encryption management is not possible (e.g. host machine managed by a cloud provider)

  1. For instances managed by cloud provider shouldn't it be the responsibility of the cloud provider to manage keys?
  2. There's surely a huge benefit of not creating additional lucene file but i think, 20% perf impact and upto 60% might be huge dealbreaker for sensitive workloads.

@asonje Do you have any perf benchmarks with your implementation? Also, have you thought how would key rotation work, we would need to reindex?

@asonje
Copy link
Author

asonje commented Sep 24, 2024

@kumargu i dont have any performance data to share; i have so far focused on functionality and correctness. Also in this implementation, you would have re-index in order to rotate the data keys.

@kumargu
Copy link
Contributor

kumargu commented Sep 24, 2024

Ok, makes sense.

[1] I also think we should come to performance sooner (when you are convinced with correctness). I am hoping we'll have very minimal penalty; let me know what your assumptions are.

[2] We should list out the scope of this PR -- listing features / enhancements that we will pick post this PR e,g IV chunking for shards > 64 GB, snapshot support, seem-less key rotation. I can help with some of those PRs. The scope will help us to get a clear idea on what we are delivering and we are not taking one-way-door decisions.

[3] Question: When a key is disabled, what do we do with the CyprtoDir for that key, its remains dangling?

Edit: @asonje I know you have put a lot of work on this PR; but I also want us to be open to the solar implementation details that was brought up earlier. thanks!

@asonje
Copy link
Author

asonje commented Sep 24, 2024

I also think we should come to performance sooner (when you are convinced with correctness). I am hoping we'll have very minimal penalty; let me know what your assumptions are.

There will be a noticeable impact to performance of the crypto directory relative to the HybridDirectory. There is no crypto equivalent yet for MMapDirectory. That would be a good addition for performance as certain index file types are read via mmap for performance.

Question: When a key is disabled, what do we do with the CyprtoDir for that key, its remains dangling?

Yes, if a key is disabled or access is revoked we do not do anything explicitly to the crypto directory

@kumargu
Copy link
Contributor

kumargu commented Sep 24, 2024

There will be a noticeable impact to performance of the crypto directory relative to the HybridDirectory.

Having the numbers handy would be really helpful :)

There is no crypto equivalent yet for MMapDirectory. That would be a good addition for performance as certain index file types are read via mmap for performance

Something to add in the todo list :)

@bruno-roustant
Copy link

I believe there no support for OS level anywhere in the solr-sandbox?

Right, OS level encryption is managed differently.

What does "commit" metadata mean? How is this metadata managed when keys are added, deleted and rotated?

It means the custom 'user' metadata optionally stored in each Lucene segment when there is a commit.
The Lucene index is a stack of segments. Each time index updates are committed, Lucene creates a new segment. Each Lucene segment is composed of multiple files, including a metadata file, managed by Lucene, which contains the names of the files in the segments as well as some 'user' custom metadata. The solr-sandbox encryption module leverages this custom metadata to store some data about the wrapped keys used in this segment. Because, by design each file can be encrypted with a different key, so there is not only one key defined in the metadata, but potentially a list of keys.

For instances managed by cloud provider shouldn't it be the responsibility of the cloud provider to manage keys?

Yes it is, but they manage an encryption key for the whole machine. It is not per tenant, it is per machine/volume.
A specific customer cannot bring their own key. The key is managed by the cloud provider, admin rights allow to decrypt.

There's surely a huge benefit of not creating additional lucene file but i think, 20% perf impact and upto 60% might be huge dealbreaker for sensitive workloads.

Creating or not creating additional files does not impact perf. What impacts perf is the number of AES blocks to decrypt.
The 20%-60% perf impact has been measured with an encryption Directory on top of a MMapDirectory.
I guarantee that you will measure this impact with the approach in this PR. The only way to not have this impact is to rely on OS-level encryption, because it allows the OS (1) to decrypt probably faster, but mainly (2) to cache decrypted blocks in OS cache so that Lucene index format can be served with already decrypted cached blocks. This is specially important for fuzzy queries (spell-check).
The question is whether the end-users want to have their own unique keys and control them (no admin rights can read the data), or they prefer fast performance and it's ok to rely on an OS-level encryption per index (but admin rights can read).

@kumargu
Copy link
Contributor

kumargu commented Sep 28, 2024

@bruno-roustant thanks, again for your detail explanation.

The question is whether the end-users want to have their own unique keys and control them (no admin rights can read the data), or they prefer fast performance and it's ok to rely on an OS-level encryption per index (but admin rights can read).

Customer managed Keys (CMK) is seeing high adoption in cloud (I am sure of AWS). A CMK support will allow building a multi-tentant env with lot of control to enterprise customers to manage access controls. The initial motivation of this PR calls it out very clearly. For multi tenant support, I think [1] key management and [2] performance are equally important. This is what I have been missing in the journey for this PR so far.

I agree with you that we will not be able to use OS level encryption (like dm-crypt ) for such use-case, but can we leverage fscrypt? Did you consider it for the solar use-case -- or multi-tenancy was not a requirement for solar?


[1] A possible usage of fscrypt will make us meet performance goals
[2] Dynamic Key rotation approach taken by you in solar-sanbox will most likely work for key rotation here as well.

@bruno-roustant
Copy link

bruno-roustant commented Oct 2, 2024

Actually I used the generic term "OS-level encryption" to include both the block device level (dm-crypt) and file system level (fscrypt).
fscrypt is a good alternative if you accept that admin rights can read. From the fscrypt documentation and user-space tool fscrypt:

  • It's worth emphasizing that none of these encryption solutions (dm-crypt or fscrypt) protect unlocked encrypted files from other users on the same system (that's the job of OS-level access control, such as UNIX file permissions), or from the cloud provider you may be running a virtual machine on.

  • fscrypt does not support encrypting files in-place. Instead, it supports marking an empty directory as encrypted. Beware that due to the characteristics of filesystems and storage devices, this may not properly protect the files, as their original contents may still be forensically recoverable from disk even after being deleted.

If we look at the definition of encryption at rest, fscrypt correctly meets the expectation, since it requires a key to unlock a directory. But when running on a cloud provider host, we don't know exactly when is the rest, and we may not accept read rights from admin users while the directory is unlocked (so not at rest).

So, yes, I considered the fscrypt option, and yes multi-tenancy is a strong requirement for the encryption module in solr-sandbox. At the end, to be able to work without compromise on cloud provider hosts, and guarantee that no admin rights can read, we chose the encryption at Java level even if it impacts performance. This is the encryption module in solr-sandbox.

@kumargu
Copy link
Contributor

kumargu commented Oct 5, 2024

sorry @bruno-roustant, i am often late to replies -- this week has been busy.

As next steps, I am going to research how "admin" right are managed on multi tenant cloud -- for example Amazon Opensearch vs Amazon Opensearch serverless have different multi tenant architectures, i.e isolation of user and admin rights are managed differently.

@kumargu
Copy link
Contributor

kumargu commented Oct 7, 2024

Hi @asonje, were you able to gather any perf numbers? this would be important to understand the trade-offs.

@asonje
Copy link
Author

asonje commented Oct 9, 2024

Hi @asonje, were you able to gather any perf numbers? this would be important to understand the trade-offs.

With the crypto directory, performance (elapsed time) drops by about 20% when compared with the ‘niofs’ directory and about 65% relative to the ‘hybridfs’ directory. I mentioned already that there isn’t yet a crypto equivalent to the mmap directory so the comparison with the niofs directory is the more direct measure of the cost of encryption/decryption.

These measurements were collected on a single node using the stack overflow workload in opensearch-benchmark

@kumargu
Copy link
Contributor

kumargu commented Oct 10, 2024

Thank you @asonje for the perf numbers. This is a very useful piece of information for the direction of this PR!

it seems like opensearch-benchmark doesn't give us p99/p100 of the server side latency, that'd have been more insightful.

@kumargu
Copy link
Contributor

kumargu commented Oct 10, 2024

Also, if you have the numbers still handy, please could you post here.

  1. How long did the tests run (that would help us to understand if we tested this against a warm cache too)
  2. the 20% degradation is compared to X ms , 60% against Y ms

@asonje
Copy link
Author

asonje commented Oct 10, 2024

  hybridfs niosfs cryptofs
Indexing Throughput (docs/s) 244,472.50 207,397.80 179,615.00
Time Elapsed (s) 255.5 355 423.4

@kumargu The results shown are averaged over 5 runs. stackoverflow is an indexing workload (no search)

@kumargu
Copy link
Contributor

kumargu commented Oct 10, 2024

can we please run nyc_taxis collecting both search/indexing both, preferable p99/p100 for 5 min window?

@kumargu
Copy link
Contributor

kumargu commented Oct 22, 2024

@asonje requesting for search & indexing benchmarks.

@asonje
Copy link
Author

asonje commented Oct 23, 2024

Here are the results:

Index  hybridfs niofs crypto
Throughput (docs/s) 570,504 563,434 537,273
time elapsed (s) 379 379 388
nyc_taxis range query hybridfs niofs cryptofs
p99 (ms) 233.3 1,395.1 4,463.6
p100 (ms) 255.8 1,439.8 5,007.8
Throughput (ops/s) 25 25 25
nyc_taxis autohisto_agg query hybridfs niofs cryptofs
p99 (ms) 56.1 90.0 201.7
p100 (ms) 72.5 101.6 216.2
Throughput (ops/s) 25 25 25

I also identified 2 potential optimizations; the first is due to a workaround to avoid using a forbidden API Filechannel.read. The synchronized bock could be replaced with i = channel.read(tmpBuffer, position) and that improves performance significantly. Here are the updated search results:

nyc_taxis range query hybridfs niofs cryptofs*
p99 (ms) 233.3 1,395.1 1,338.7
p100 (ms) 255.8 1,439.8 1,370.2
Throughput (ops/s) 25 25 25
nyc_taxis autohisto_agg query hybridfs niofs cryptofs *
p99 (ms) 56.1 90.0 165.4
p100 (ms) 72.5 101.6 171.7
Throughput (ops/s) 25 25 25

The second optimization would be to cache the Cipher objects to reduce the cost of reinitialization. This would improve cryptofs performance on the autohisto_agg query.

@kumargu
Copy link
Contributor

kumargu commented Oct 23, 2024

Thank-you @asonje ! The numbers are exciting for range queries. Great work identifying the contention at the synchronized block. Looking forward for the Cipher cache optimisation.

[1] Since Filechannel.read you mentioned is forbidden, can we use FileLock in shared mode?

[2] Throughput (ops/s) = 25 is too low for benchmarking. I was expecting it to be in 1000s. @cwperks could you suggest what we have been using for benchmarks.

@kumargu
Copy link
Contributor

kumargu commented Oct 29, 2024

{"run-benchmark-test": "id_1"}

Copy link
Contributor

The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/1540/ . Final results will be published once the job is completed.

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/1540/

Metric Task Value Unit
Cumulative indexing time of primary shards 238.882 min
Min cumulative indexing time across primary shards 238.882 min
Median cumulative indexing time across primary shards 238.882 min
Max cumulative indexing time across primary shards 238.882 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 129.107 min
Cumulative merge count of primary shards 68
Min cumulative merge time across primary shards 129.107 min
Median cumulative merge time across primary shards 129.107 min
Max cumulative merge time across primary shards 129.107 min
Cumulative merge throttle time of primary shards 33.9212 min
Min cumulative merge throttle time across primary shards 33.9212 min
Median cumulative merge throttle time across primary shards 33.9212 min
Max cumulative merge throttle time across primary shards 33.9212 min
Cumulative refresh time of primary shards 15.9881 min
Cumulative refresh count of primary shards 142
Min cumulative refresh time across primary shards 15.9881 min
Median cumulative refresh time across primary shards 15.9881 min
Max cumulative refresh time across primary shards 15.9881 min
Cumulative flush time of primary shards 4.89562 min
Cumulative flush count of primary shards 32
Min cumulative flush time across primary shards 4.89562 min
Median cumulative flush time across primary shards 4.89562 min
Max cumulative flush time across primary shards 4.89562 min
Total Young Gen GC time 14.393 s
Total Young Gen GC count 349
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 28.6251 GB
Translog size 5.12227e-08 GB
Heap used for segments 0 MB
Heap used for doc values 0 MB
Heap used for terms 0 MB
Heap used for norms 0 MB
Heap used for points 0 MB
Heap used for stored fields 0 MB
Segment count 44
Min Throughput index 40916.2 docs/s
Mean Throughput index 43209.4 docs/s
Median Throughput index 43601.8 docs/s
Max Throughput index 45294.7 docs/s
50th percentile latency index 1693.77 ms
90th percentile latency index 2387.05 ms
99th percentile latency index 7894.79 ms
99.9th percentile latency index 13345.4 ms
99.99th percentile latency index 15885.4 ms
100th percentile latency index 16456.9 ms
50th percentile service time index 1693.87 ms
90th percentile service time index 2387.75 ms
99th percentile service time index 7894.94 ms
99.9th percentile service time index 13345.4 ms
99.99th percentile service time index 15885.4 ms
100th percentile service time index 16456.9 ms
error rate index 0.01 %
Min Throughput wait-until-merges-finish 0 ops/s
Mean Throughput wait-until-merges-finish 0 ops/s
Median Throughput wait-until-merges-finish 0 ops/s
Max Throughput wait-until-merges-finish 0 ops/s
100th percentile latency wait-until-merges-finish 294292 ms
100th percentile service time wait-until-merges-finish 294292 ms
error rate wait-until-merges-finish 0 %

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Baseline Comparison Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/22/

Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 226.829 238.882 12.0527 min
Min cumulative indexing time across primary shard 226.829 238.882 12.0527 min
Median cumulative indexing time across primary shard 226.829 238.882 12.0527 min
Max cumulative indexing time across primary shard 226.829 238.882 12.0527 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 105.212 129.107 23.8943 min
Cumulative merge count of primary shards 68 68 0
Min cumulative merge time across primary shard 105.212 129.107 23.8943 min
Median cumulative merge time across primary shard 105.212 129.107 23.8943 min
Max cumulative merge time across primary shard 105.212 129.107 23.8943 min
Cumulative merge throttle time of primary shards 20.8878 33.9212 13.0335 min
Min cumulative merge throttle time across primary shard 20.8878 33.9212 13.0335 min
Median cumulative merge throttle time across primary shard 20.8878 33.9212 13.0335 min
Max cumulative merge throttle time across primary shard 20.8878 33.9212 13.0335 min
Cumulative refresh time of primary shards 14.1585 15.9881 1.82965 min
Cumulative refresh count of primary shards 136 142 6
Min cumulative refresh time across primary shard 14.1585 15.9881 1.82965 min
Median cumulative refresh time across primary shard 14.1585 15.9881 1.82965 min
Max cumulative refresh time across primary shard 14.1585 15.9881 1.82965 min
Cumulative flush time of primary shards 4.59807 4.89562 0.29755 min
Cumulative flush count of primary shards 35 32 -3
Min cumulative flush time across primary shard 4.59807 4.89562 0.29755 min
Median cumulative flush time across primary shard 4.59807 4.89562 0.29755 min
Max cumulative flush time across primary shard 4.59807 4.89562 0.29755 min
Total Young Gen GC time 14.164 14.393 0.229 s
Total Young Gen GC count 331 349 18
Total Old Gen GC time 0 0 0 s
Total Old Gen GC count 0 0 0
Store size 28.561 28.6251 0.06413 GB
Translog size 5.12227e-08 5.12227e-08 0 GB
Heap used for segments 0 0 0 MB
Heap used for doc values 0 0 0 MB
Heap used for terms 0 0 0 MB
Heap used for norms 0 0 0 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0 0 0 MB
Segment count 45 44 -1
Min Throughput index 44638.4 40916.2 -3722.24 docs/s
Mean Throughput index 46778.3 43209.4 -3568.84 docs/s
Median Throughput index 46243.2 43601.8 -2641.37 docs/s
Max Throughput index 50606.6 45294.7 -5311.91 docs/s
50th percentile latency index 1570.4 1693.77 123.37 ms
90th percentile latency index 2195.5 2387.05 191.552 ms
99th percentile latency index 7290.54 7894.79 604.253 ms
99.9th percentile latency index 13326.1 13345.4 19.2608 ms
99.99th percentile latency index 15968.1 15885.4 -82.6784 ms
100th percentile latency index 23315.6 16456.9 -6858.7 ms
50th percentile service time index 1570.29 1693.87 123.573 ms
90th percentile service time index 2196.58 2387.75 191.178 ms
99th percentile service time index 7287.7 7894.94 607.244 ms
99.9th percentile service time index 13326.1 13345.4 19.2608 ms
99.99th percentile service time index 15968.1 15885.4 -82.6784 ms
100th percentile service time index 23315.6 16456.9 -6858.7 ms
error rate index 0.00651381 0.0064704 -4e-05 %
Min Throughput wait-until-merges-finish 0.00313287 0.00339798 0.00027 ops/s
Mean Throughput wait-until-merges-finish 0.00313287 0.00339798 0.00027 ops/s
Median Throughput wait-until-merges-finish 0.00313287 0.00339798 0.00027 ops/s
Max Throughput wait-until-merges-finish 0.00313287 0.00339798 0.00027 ops/s
100th percentile latency wait-until-merges-finish 319196 294292 -24903.2 ms
100th percentile service time wait-until-merges-finish 319196 294292 -24903.2 ms
error rate wait-until-merges-finish 0 0 0 %

@kumargu
Copy link
Contributor

kumargu commented Nov 8, 2024

@asonje, fyi, I ran a standard benchmark via commenting. Maybe you could remove the synchronize block and rerun the standard benchmark?

Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
@asonje
Copy link
Author

asonje commented Nov 9, 2024

{"run-benchmark-test": "id_2"}

Copy link
Contributor

github-actions bot commented Nov 9, 2024

The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/1618/ . Final results will be published once the job is completed.

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/1618/

Metric Task Value Unit
Cumulative indexing time of primary shards 167.658 min
Min cumulative indexing time across primary shards 0 min
Median cumulative indexing time across primary shards 6.52187 min
Max cumulative indexing time across primary shards 125.447 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 76.8255 min
Cumulative merge count of primary shards 82
Min cumulative merge time across primary shards 0 min
Median cumulative merge time across primary shards 1.20253 min
Max cumulative merge time across primary shards 69.8232 min
Cumulative merge throttle time of primary shards 28.686 min
Min cumulative merge throttle time across primary shards 0 min
Median cumulative merge throttle time across primary shards 0.19055 min
Max cumulative merge throttle time across primary shards 27.5956 min
Cumulative refresh time of primary shards 2.40717 min
Cumulative refresh count of primary shards 187
Min cumulative refresh time across primary shards 0 min
Median cumulative refresh time across primary shards 0.167683 min
Max cumulative refresh time across primary shards 1.28473 min
Cumulative flush time of primary shards 10.9901 min
Cumulative flush count of primary shards 108
Min cumulative flush time across primary shards 0 min
Median cumulative flush time across primary shards 0.559483 min
Max cumulative flush time across primary shards 7.92678 min
Total Young Gen GC time 13.311 s
Total Young Gen GC count 388
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 18.8594 GB
Translog size 1.02632e-06 GB
Heap used for segments 0 MB
Heap used for doc values 0 MB
Heap used for terms 0 MB
Heap used for norms 0 MB
Heap used for points 0 MB
Heap used for stored fields 0 MB
Segment count 8
Min Throughput index-append 79293.6 docs/s
Mean Throughput index-append 81955.2 docs/s
Median Throughput index-append 82151.8 docs/s
Max Throughput index-append 84177.4 docs/s
50th percentile latency index-append 438.723 ms
90th percentile latency index-append 604.241 ms
99th percentile latency index-append 1139.63 ms
99.9th percentile latency index-append 5744.85 ms
99.99th percentile latency index-append 7254.43 ms
100th percentile latency index-append 8112.11 ms
50th percentile service time index-append 438.702 ms
90th percentile service time index-append 604.214 ms
99th percentile service time index-append 1139.67 ms
99.9th percentile service time index-append 5744.85 ms
99.99th percentile service time index-append 7254.43 ms
100th percentile service time index-append 8112.11 ms
error rate index-append 0 %
Min Throughput wait-until-merges-finish 0.01 ops/s
Mean Throughput wait-until-merges-finish 0.01 ops/s
Median Throughput wait-until-merges-finish 0.01 ops/s
Max Throughput wait-until-merges-finish 0.01 ops/s
100th percentile latency wait-until-merges-finish 149662 ms
100th percentile service time wait-until-merges-finish 149662 ms
error rate wait-until-merges-finish 0 %
Min Throughput wait-until-merges-1-seg-finish 123.5 ops/s
Mean Throughput wait-until-merges-1-seg-finish 123.5 ops/s
Median Throughput wait-until-merges-1-seg-finish 123.5 ops/s
Max Throughput wait-until-merges-1-seg-finish 123.5 ops/s
100th percentile latency wait-until-merges-1-seg-finish 7.79346 ms
100th percentile service time wait-until-merges-1-seg-finish 7.79346 ms
error rate wait-until-merges-1-seg-finish 0 %

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Baseline Comparison Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/23/

Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 170.336 167.658 -2.67867 min
Min cumulative indexing time across primary shard 0 0 0 min
Median cumulative indexing time across primary shard 7.13363 6.52187 -0.61177 min
Max cumulative indexing time across primary shard 127.078 125.447 -1.63033 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 76.2919 76.8255 0.53358 min
Cumulative merge count of primary shards 85 82 -3
Min cumulative merge time across primary shard 0 0 0 min
Median cumulative merge time across primary shard 1.09522 1.20253 0.10732 min
Max cumulative merge time across primary shard 68.8309 69.8232 0.99227 min
Cumulative merge throttle time of primary shards 34.2203 28.686 -5.5343 min
Min cumulative merge throttle time across primary shard 0 0 0 min
Median cumulative merge throttle time across primary shard 0.212167 0.19055 -0.02162 min
Max cumulative merge throttle time across primary shard 32.7638 27.5956 -5.16812 min
Cumulative refresh time of primary shards 1.77567 2.40717 0.6315 min
Cumulative refresh count of primary shards 200 187 -13
Min cumulative refresh time across primary shard 0 0 0 min
Median cumulative refresh time across primary shard 0.134283 0.167683 0.0334 min
Max cumulative refresh time across primary shard 0.9094 1.28473 0.37533 min
Cumulative flush time of primary shards 9.66832 10.9901 1.32183 min
Cumulative flush count of primary shards 108 108 0
Min cumulative flush time across primary shard 0 0 0 min
Median cumulative flush time across primary shard 0.412817 0.559483 0.14667 min
Max cumulative flush time across primary shard 6.84105 7.92678 1.08573 min
Total Young Gen GC time 10.689 13.311 2.622 s
Total Young Gen GC count 331 388 57
Total Old Gen GC time 0 0 0 s
Total Old Gen GC count 0 0 0
Store size 18.9309 18.8594 -0.07145 GB
Translog size 1.02632e-06 1.02632e-06 0 GB
Heap used for segments 0 0 0 MB
Heap used for doc values 0 0 0 MB
Heap used for terms 0 0 0 MB
Heap used for norms 0 0 0 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0 0 0 MB
Segment count 8 8 0
Min Throughput index-append 82296 79293.6 -3002.41 docs/s
Mean Throughput index-append 84559.8 81955.2 -2604.62 docs/s
Median Throughput index-append 85043.9 82151.8 -2892.06 docs/s
Max Throughput index-append 86642 84177.4 -2464.55 docs/s
50th percentile latency index-append 422.46 438.723 16.2632 ms
90th percentile latency index-append 577.558 604.241 26.6835 ms
99th percentile latency index-append 1186.47 1139.63 -46.838 ms
99.9th percentile latency index-append 5119.46 5744.85 625.393 ms
99.99th percentile latency index-append 6135.93 7254.43 1118.5 ms
100th percentile latency index-append 6654.76 8112.11 1457.35 ms
50th percentile service time index-append 422.461 438.702 16.2409 ms
90th percentile service time index-append 577.579 604.214 26.635 ms
99th percentile service time index-append 1185.57 1139.67 -45.9 ms
99.9th percentile service time index-append 5119.46 5744.85 625.393 ms
99.99th percentile service time index-append 6135.93 7254.43 1118.5 ms
100th percentile service time index-append 6654.76 8112.11 1457.35 ms
error rate index-append 0 0 0 %
Min Throughput wait-until-merges-finish 0.00288805 0.00668173 0.00379 ops/s
Mean Throughput wait-until-merges-finish 0.00288805 0.00668173 0.00379 ops/s
Median Throughput wait-until-merges-finish 0.00288805 0.00668173 0.00379 ops/s
Max Throughput wait-until-merges-finish 0.00288805 0.00668173 0.00379 ops/s
100th percentile latency wait-until-merges-finish 346254 149662 -196593 ms
100th percentile service time wait-until-merges-finish 346254 149662 -196593 ms
error rate wait-until-merges-finish 0 0 0 %
Min Throughput wait-until-merges-1-seg-finish 112.547 123.496 10.9489 ops/s
Mean Throughput wait-until-merges-1-seg-finish 112.547 123.496 10.9489 ops/s
Median Throughput wait-until-merges-1-seg-finish 112.547 123.496 10.9489 ops/s
Max Throughput wait-until-merges-1-seg-finish 112.547 123.496 10.9489 ops/s
100th percentile latency wait-until-merges-1-seg-finish 8.39773 7.79346 -0.60427 ms
100th percentile service time wait-until-merges-1-seg-finish 8.39773 7.79346 -0.60427 ms
error rate wait-until-merges-1-seg-finish 0 0 0 %

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making feature New feature or request RFC Issues requesting major changes security Anything security related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On The Fly Encryption Feature Proposal
8 participants