-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add opaque type support to PrestoSerializer #11256
Conversation
This pull request was exported from Phabricator. Differential Revision: D64362525 |
✅ Deploy Preview for meta-velox canceled.
|
Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Differential Revision: D64362525
f058cb1
to
152650b
Compare
This pull request was exported from Phabricator. Differential Revision: D64362525 |
Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Differential Revision: D64362525
152650b
to
59b424d
Compare
Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Differential Revision: D64362525
This pull request was exported from Phabricator. Differential Revision: D64362525 |
|
||
static std::shared_ptr<Foo> create(int64_t id) { | ||
// Return the same instance if the id is already in the map, to make | ||
// the operator== work with the instance before and after serde. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I followed. Why do you need to compare that they are the same instances, and not just different instances which are equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because the comparison of shared_ptr will compare the raw pointer address, so it needs to be the same instance. we could change the test utilities to de-reference before comparisons if desired.
continue; | ||
} | ||
stream->appendNonNull(); | ||
std::shared_ptr<void> ptr = rawValues[offset]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unused variable
@@ -1935,6 +2001,39 @@ void serializeFlatVector<TypeKind::BOOLEAN>( | |||
} | |||
} | |||
|
|||
template <> | |||
void serializeFlatVector<TypeKind::OPAQUE>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is used by the PrestoBatchVectorSerializer
only. You also need to add an implementation for PrestoIterativeVectorSerializer
. I am not sure which one you use for your use-case. To give you more context the PrestoBatchVectorSerializer
is optimized for serializing full vectors whereas PrestoIterativeVectorSerializer
is used for serializing subsets of rows scattered within a single vector(which is usually used for a partitioning exchange).
the serializeFlatVector
implementation for PrestoIterativeVectorSerializer
is here:
velox/velox/serializers/PrestoSerializer.cpp
Line 2737 in a33e8d7
void serializeFlatVector<TypeKind::OPAQUE>( |
Finally, for a complete implementation, you also need to implement estimateSerializedSize
for both types of serializers. This is used by the PartitionedOutput node to figure out when it needs to flush data to the wire.
} | ||
|
||
const int64_t rawStringSizeBytes = dataSize * sizeof(char); | ||
char* rawString = (char*)pool->allocate(rawStringSizeBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of directly using the pool's allocate API that returns a raw pointer, I would suggest create a buffer and using a BufferPtr instead since its much more safer as the space will be freed when BufferPtr is destroyed, otherwise if any of the steps throw after this then it can introduce a memory leak.
Its like using a shared_ptr vs a raw ptr and has similar benefits.
You can use:
BufferPtr newBuffer = AlignedBuffer::allocate<char>(dataSize, pool());
@@ -1555,6 +1589,7 @@ TEST_F(PrestoSerializerTest, deserializeSingleColumn) { | |||
DOUBLE(), | |||
VARCHAR(), | |||
TIMESTAMP(), | |||
OPAQUE<Foo>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just adding a type to deserializeSingleColumn
which is not used for serialization across the wire (See method comment of deserializeSingleColumn
for more info). Therefore, you would also want to add test coverage for the serialize (both batch and iterative) and deserialize methods.
Also would have to add tests for estimateSerializedSize
when you implement it
59b424d
to
8f9036a
Compare
Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Differential Revision: D64362525
This pull request was exported from Phabricator. Differential Revision: D64362525 |
8f9036a
to
e21d43b
Compare
Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Differential Revision: D64362525
This pull request was exported from Phabricator. Differential Revision: D64362525 |
addressing comments
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few nits. The build is failing, can you please take a look into it
const SerdeOpts&, | ||
VectorPtr& result) { | ||
// Opaque values are serialized by first converting them to string | ||
// them serializing them as if they were string. The deserializable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "then" serializing
auto inputVector = makeFlatVector<std::shared_ptr<void>>( | ||
2, | ||
[](vector_size_t row) { return Foo::create(row + 10); }, | ||
/*isNullAt=*/nullptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a test with nulls for completeness
@@ -1339,6 +1373,25 @@ TEST_P(PrestoSerializerTest, encodedRoundtrip) { | |||
} | |||
} | |||
|
|||
TEST_P(PrestoSerializerTest, opaqueBatchVectorSerializer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a test to ensure that opaque type will throw an error if used with Iterative serializer and if estimateSerializedSize is used
Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Differential Revision: D64362525
e21d43b
to
a64a8f7
Compare
This pull request was exported from Phabricator. Differential Revision: D64362525 |
|
Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Differential Revision: D64362525
a64a8f7
to
32aa0da
Compare
This pull request was exported from Phabricator. Differential Revision: D64362525 |
32aa0da
to
352d45e
Compare
Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Reviewed By: bikramSingh91 Differential Revision: D64362525
This pull request was exported from Phabricator. Differential Revision: D64362525 |
…11256) Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Reviewed By: bikramSingh91 Differential Revision: D64362525
352d45e
to
cbc819c
Compare
This pull request was exported from Phabricator. Differential Revision: D64362525 |
…11256) Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Reviewed By: bikramSingh91 Differential Revision: D64362525
cbc819c
to
9fae51d
Compare
This pull request was exported from Phabricator. Differential Revision: D64362525 |
…11256) Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Reviewed By: bikramSingh91 Differential Revision: D64362525
9fae51d
to
955e1e9
Compare
This pull request was exported from Phabricator. Differential Revision: D64362525 |
…or#11256) Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Reviewed By: bikramSingh91 Differential Revision: D64362525
955e1e9
to
d5f0ad7
Compare
This pull request was exported from Phabricator. Differential Revision: D64362525 |
…or#11256) Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Reviewed By: bikramSingh91 Differential Revision: D64362525
d5f0ad7
to
da43424
Compare
This pull request was exported from Phabricator. Differential Revision: D64362525 |
…11256) Summary: The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via OpaqueType::registerSerialization(); then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque. Reviewed By: bikramSingh91 Differential Revision: D64362525
da43424
to
a857cfb
Compare
This pull request was exported from Phabricator. Differential Revision: D64362525 |
This pull request has been merged in ce67924. |
Summary:
The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via
OpaqueType::registerSerialization();
then inside
PrestoSerializer.cpp
we use these to write and read values of type opaque.Differential Revision: D64362525