-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Add support for multi get operation for database queries #2396
perf: Add support for multi get operation for database queries #2396
Conversation
f9e0289
to
2532b06
Compare
97f786f
to
5405e6c
Compare
2920ae8
to
cff7b47
Compare
cff7b47
to
b23e4be
Compare
Proposals to the #2396
crates/storage/src/kv_store.rs
Outdated
@@ -41,7 +44,7 @@ pub trait StorageColumn: Copy + core::fmt::Debug { | |||
|
|||
/// The definition of the key-value inspection store. | |||
#[impl_tools::autoimpl(for<T: trait> &T, &mut T, Box<T>)] | |||
pub trait KeyValueInspect { | |||
pub trait KeyValueInspect: Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very sad that we need to require Send + Sync
only because BoxedIter
requires Send
. Maybe we can define non Send boxed iterator instead and use it(because it seems we don't need Send
feature, but maybe I'm wrong)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this bound is just to satisfy the BoxedIter
requirement. I can see if I can define a non-send one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hitting some walls with this implementation. Will try again tomorrow with fresh eyes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright my mistake was trying to change some iterators that we turn into streams, and they need to be Send
. Now I have managed to get rid of these trait bounds cbb6efc.
Now we have two boxed iterators: BoxedIter
which doesn't require Send, and BoxedIterSend
for the cases when you need Send.
<Self as StorageBatchInspect<OldTransactions>>::get_batch(self, ids) | ||
.map(|result| result.and_then(|opt| opt.ok_or(not_found!(OldTransactions)))) | ||
.into_boxed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, you can add the same syntax sugars that we did with storage_as_ref
to avoid <Self as StorageInspect<M>>
usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting, that would be nice. I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this is not as trivial as I had thought, as we're hitting some lifetime issues.
The problem
get_batch
returns an iterator bound to the lifetime of the self
parameter. I.e. the storage we call it on. This is necessary since we do self.get(...)
within the KeyValueInspect::get_batch
implementation.
So while we can implement StorageBatchInspect
for the StorageRef
type as:
impl<'a, S, Type> StorageBatchInspect<Type> for StorageRef<'a, S, Type>
where
S: StorageBatchInspect<Type>,
Type: Mappable,
{
#[inline(always)]
fn get_batch<'b, Iter>(
&'b self,
_keys: Iter,
) -> impl Iterator<Item = Result<Option<Type::OwnedValue>>> + 'b
where
Iter: 'b + IntoIterator<Item = &'b Type::Key>,
Type::Key: 'b,
{
None.into_iter() // Note that we'd need access to `self.0` here which requires VM changes, or copying the `StorageRef` type into this crate. But that's a separate, and very manageable issue
}
}
when we try to use it as
fn old_transactions<'a>(
&'a self,
ids: BoxedIter<'a, &'a TxId>,
) -> BoxedIter<'a, StorageResult<Transaction>> {
self.storage::<OldTransactions>()
.get_batch(ids)
.map(|result| result.and_then(|opt| opt.ok_or(not_found!(OldTransactions))))
.into_boxed()
}
we're hitting this issue
error[E0716]: temporary value dropped while borrowed
--> crates/fuel-core/src/service/adapters/graphql_api/off_chain.rs:183:9
|
179 | fn old_transactions<'a>(
| -- lifetime `'a` defined here
...
183 | self.storage::<OldTransactions>()
| -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| _________creates a temporary value which is freed while still in use
| |
184 | | .get_batch(ids)
| |___________________________- argument requires that borrow lasts for `'a`
...
187 | }
| - temporary value is freed at the end of this statement
Going forward
For this PR I suggest we don't go down this rabbit hole, but I'd be happy to do it as a follow-up if you think it's possible/interesting to explore it. Seems like a quite low prio though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to get some benchmarks(I guess you need to add new one that will use GraphQL) =)
Since this PR add `into_bytes` for the encoder, we can optimize the batch mutate operations as well #2396 (review)
Sure thing. I'm not sure if it makes sense to use criterion/cargo bench for end-to-end performance testing though as it is more geared towards micro-benchmarking, and we need to do something like the following.
I'll look into our options for this type of workload. Let me know if you have any opinions or thoughts on this. |
32993e0
to
3a3d967
Compare
3a3d967
to
cbb6efc
Compare
I'll aim at defining these workloads as a stand-alone binary, which would allow us to use hyperfine to actually execute the benchmarks and interpret the results. |
Co-authored-by: Rafał Chabowski <88321181+rafal-ch@users.noreply.github.com>
storage: &'a S, | ||
keys: Iter, | ||
column: S::Column, | ||
) -> impl Iterator<Item = crate::Result<Option<M::OwnedValue>>> + 'a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
) -> impl Iterator<Item = crate::Result<Option<M::OwnedValue>>> + 'a | |
) -> impl IntoIterator<Item = crate::Result<Option<M::OwnedValue>>> + 'a |
Iterator
s implement IntoIterator
, so if you change the signature to return a IntoIterator
then StorageBatchInspect
becomes slightly more flexible, as implementations can return either an iterator or a collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general principle, I like interfaces to be very general in what they accept but prefer them to be specific in what they return. (see Robusness principle). That's why I typically accept IntoIterator
in parameters, because as you say then the caller doesn't have to care about doing `.into_iter()´ explicitly if they have a collection or anything.
Therefore, for return values it's better in my opinion to return the Iterator
directly since it is more specific. Otherwise, all callers are basically forced to call .into_iter()
before they can do anything with the returned iterator, which is less ergonomic. Any implementer of the trait can (and should) always call .into_iter()
if they are about to return a collection. Again, not doing so would just force this burden upon the user of the function which is less than ideal without providing any benefit whatsoever.
Let me know if you disagree and I'll be happy to further discuss this.
let transactions: Vec<_> = self | ||
.on_chain | ||
.transactions(tx_ids.iter().into_boxed()) | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you can avoid collecting here since later you iterate on transactions again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unforutnately I need to both iterate over the transactions and zip them with the on_chain_results - so we need to iterate twice over the tersults. We could potentially do something fancy with Itertools::tee, but I think this solution is simpler and easier on the eyes.
36386af
to
c3b7264
Compare
Got some initial benchmark results in #2433, and it doesn't look good for the multi-get implementation. So far it's either slower or the same for all workloads I've tried. I've done the following:
I'm all ears if anyone has any suggestions for workloads to try, but until we can find evidence that this change actually is an improvement in any way we should put this PR on hold. |
this is interesting, do you find any deviation from your impl and the benchmark code in fuel-core/benches/src/db_lookup_times_utils/utils.rs Lines 82 to 101 in 896e4cf
|
Yes, that code only tests the multi-get operation directly. I have no doubt that the multi-get operation gives a performance benefit in isolation, but with the current change it seems to be a net-negative performance effect. I suspect it's the boxed iterators, since each boxed iterator requires a vtable lookup to reference a single element if I'm not mistaken, and I would not be surprised if this overhead is enough to negate any positive effects of the multi-get. But we need to investigate this further before we can conclude anything for certain, and on my side this is de-prioritized in favor of the fraud-proving work. With some larger refactors in the GraphQL crate, we should be able to avoid the boxed iterators - but we need to decide how to proceed and prioritize that work first. |
Closing this PR for now, and we can re-open it when we want to revisit this. |
Linked Issues
Closes #2344
Description
This PR adds support for getting batches of values from the storage through a new
StorageBatchInspect
trait infuel-core-storage
.This trait is implemented for
StructuredStorage
andGenericDatabase
through a newget_batch
method added to theKeyValueInspect
andBluePrintInspect
traits. Theget_batch
method is implemented with themulti-get
operation for the RocksDB based storage implementations, and uses a default blanket implementation for in-memory implementations.Checklist
Before requesting review