Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add opaque type support to PrestoSerializer #11256

Closed
wants to merge 1 commit into from

Conversation

kunigami
Copy link
Contributor

Summary:
The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via

OpaqueType::registerSerialization();

then inside PrestoSerializer.cpp we use these to write and read values of type opaque.

Differential Revision: D64362525

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 15, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

Copy link

netlify bot commented Oct 15, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit a857cfb
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/673beab0d4180200083c41f1

kunigami pushed a commit to kunigami/velox that referenced this pull request Oct 29, 2024
Summary:

The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque.

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

kunigami pushed a commit to kunigami/velox that referenced this pull request Oct 29, 2024
Summary:

The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque.

Differential Revision: D64362525
kunigami pushed a commit to kunigami/velox that referenced this pull request Oct 31, 2024
Summary:

The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque.

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525


static std::shared_ptr<Foo> create(int64_t id) {
// Return the same instance if the id is already in the map, to make
// the operator== work with the instance before and after serde.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I followed. Why do you need to compare that they are the same instances, and not just different instances which are equal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because the comparison of shared_ptr will compare the raw pointer address, so it needs to be the same instance. we could change the test utilities to de-reference before comparisons if desired.

continue;
}
stream->appendNonNull();
std::shared_ptr<void> ptr = rawValues[offset];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unused variable

@@ -1935,6 +2001,39 @@ void serializeFlatVector<TypeKind::BOOLEAN>(
}
}

template <>
void serializeFlatVector<TypeKind::OPAQUE>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is used by the PrestoBatchVectorSerializer only. You also need to add an implementation for PrestoIterativeVectorSerializer. I am not sure which one you use for your use-case. To give you more context the PrestoBatchVectorSerializer is optimized for serializing full vectors whereas PrestoIterativeVectorSerializer is used for serializing subsets of rows scattered within a single vector(which is usually used for a partitioning exchange).

the serializeFlatVector implementation for PrestoIterativeVectorSerializer is here:

void serializeFlatVector<TypeKind::OPAQUE>(

Finally, for a complete implementation, you also need to implement estimateSerializedSize for both types of serializers. This is used by the PartitionedOutput node to figure out when it needs to flush data to the wire.

}

const int64_t rawStringSizeBytes = dataSize * sizeof(char);
char* rawString = (char*)pool->allocate(rawStringSizeBytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of directly using the pool's allocate API that returns a raw pointer, I would suggest create a buffer and using a BufferPtr instead since its much more safer as the space will be freed when BufferPtr is destroyed, otherwise if any of the steps throw after this then it can introduce a memory leak.

Its like using a shared_ptr vs a raw ptr and has similar benefits.

You can use:

BufferPtr newBuffer = AlignedBuffer::allocate<char>(dataSize, pool());

@@ -1555,6 +1589,7 @@ TEST_F(PrestoSerializerTest, deserializeSingleColumn) {
DOUBLE(),
VARCHAR(),
TIMESTAMP(),
OPAQUE<Foo>(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just adding a type to deserializeSingleColumn which is not used for serialization across the wire (See method comment of deserializeSingleColumn for more info). Therefore, you would also want to add test coverage for the serialize (both batch and iterative) and deserialize methods.

Also would have to add tests for estimateSerializedSize when you implement it

kunigami pushed a commit to kunigami/velox that referenced this pull request Nov 12, 2024
Summary:

The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque.

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

kunigami pushed a commit to kunigami/velox that referenced this pull request Nov 12, 2024
Summary:

The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque.

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

@kunigami
Copy link
Contributor Author

addressing comments

  • use AlignedBuffer::allocate()
  • discussed w/ bikramSingh91 that we don't need PrestoIterativeVectorSerializer and estimateSerializedSize
  • add test case for opaque type serialization/deserialization

Copy link
Contributor

@bikramSingh91 bikramSingh91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few nits. The build is failing, can you please take a look into it

const SerdeOpts&,
VectorPtr& result) {
// Opaque values are serialized by first converting them to string
// them serializing them as if they were string. The deserializable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "then" serializing

auto inputVector = makeFlatVector<std::shared_ptr<void>>(
2,
[](vector_size_t row) { return Foo::create(row + 10); },
/*isNullAt=*/nullptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add a test with nulls for completeness

@@ -1339,6 +1373,25 @@ TEST_P(PrestoSerializerTest, encodedRoundtrip) {
}
}

TEST_P(PrestoSerializerTest, opaqueBatchVectorSerializer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add a test to ensure that opaque type will throw an error if used with Iterative serializer and if estimateSerializedSize is used

kunigami pushed a commit to kunigami/velox that referenced this pull request Nov 12, 2024
Summary:

The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque.

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

@kunigami
Copy link
Contributor Author

  • address comments. use find() instead of contains() so it doesn't rely on newer c++ features

kunigami pushed a commit to kunigami/velox that referenced this pull request Nov 13, 2024
Summary:

The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque.

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

kunigami pushed a commit to kunigami/velox that referenced this pull request Nov 18, 2024
Summary:

The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque.

Reviewed By: bikramSingh91

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

kunigami pushed a commit to kunigami/velox that referenced this pull request Nov 18, 2024
…11256)

Summary:

The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque.

Reviewed By: bikramSingh91

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

@kunigami kunigami changed the title add opaque type support to PrestoSerializer feat: add opaque type support to PrestoSerializer Nov 18, 2024
kunigami pushed a commit to kunigami/velox that referenced this pull request Nov 18, 2024
…11256)

Summary:

The idea of opaque types in Presto serializer is that we'll serialize the underlying opaque type to StringView. This requires a serde callbacks to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values of type opaque.

Reviewed By: bikramSingh91

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

kunigami pushed a commit to kunigami/velox that referenced this pull request Nov 18, 2024
…11256)

Summary:

The idea of opaque types in Presto serializer is that we'll serialize the 
underlying opaque type to StringView. This requires a serde callbacks 
to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values
of type opaque.

Reviewed By: bikramSingh91

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

@kunigami kunigami changed the title feat: add opaque type support to PrestoSerializer feature: add opaque type support to PrestoSerializer Nov 19, 2024
kunigami pushed a commit to kunigami/velox that referenced this pull request Nov 19, 2024
…or#11256)

Summary:

The idea of opaque types in Presto serializer is that we'll serialize the 
underlying opaque type to StringView. This requires a serde callbacks 
to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values
of type opaque.

Reviewed By: bikramSingh91

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

kunigami pushed a commit to kunigami/velox that referenced this pull request Nov 19, 2024
…or#11256)

Summary:

The idea of opaque types in Presto serializer is that we'll serialize the 
underlying opaque type to StringView. This requires a serde callbacks 
to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values
of type opaque.

Reviewed By: bikramSingh91

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

@kunigami kunigami changed the title feature: add opaque type support to PrestoSerializer feat: Add opaque type support to PrestoSerializer Nov 19, 2024
…11256)

Summary:

The idea of opaque types in Presto serializer is that we'll serialize the 
underlying opaque type to StringView. This requires a serde callbacks 
to be registered via

  OpaqueType::registerSerialization();

then inside `PrestoSerializer.cpp` we use these to write and read values
of type opaque.

Reviewed By: bikramSingh91

Differential Revision: D64362525
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64362525

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ce67924.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants