Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust): ParquetCloudSink to allow streaming pipelines into remote ObjectStores #10060
feat(rust): ParquetCloudSink to allow streaming pipelines into remote ObjectStores #10060
Changes from 57 commits
92aa917
50d04a3
c7073bd
09654e4
9814ddc
9e2050c
4a6348a
4c0cdb2
cef96ae
1dbac7a
2603a57
eb340e3
76be1c9
7b95eb4
98257e7
b82d47d
43aaa6c
d02095f
f41e0b7
23dd8c9
e915d25
5bcc9eb
44f876a
7a1355f
a78a79e
222d285
4782d87
db67914
1b1d24a
cefebe1
7f3d52b
036be68
628fda6
4bc92e9
326e5ae
cfe8e5b
46c1cd3
eda196c
24cccbc
ca7af57
dd0fffe
a6e4842
e44c91c
4c97d1d
c2184e8
18ecf79
f646668
5ba5570
2cf8c44
c378d91
121fbb0
75c3515
6dfe076
1d4fef6
fc0ceea
73c02e0
2861c17
0e9154e
a70e80a
a41b5b9
1f7cd12
26fa1f6
ab968bd
51c5dad
b9ccae8
091a87f
58c1966
7e0d694
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the
cloud_options
parameter, this looks exactly the same as thesink_parquet
method. Isn't there a possibility that you can just pass in the uri and the write/sink methods can determine if it's an uri or a local path?That would make it much more easy to maintain the code, and open up all methods to natively read & write to local/cloud based sources/targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While possible, there are some drawbacks of doing so:
sink_parquet
more difficult to use for anyone not using the cloud feature, since for writing to a normal file you'd now need to pass in an extraNone
every time.cloud_write
enables/disablessink_parquet_cloud
. Attempting to use it without the feature flag enabled results in a compile-time error. If it would alter the internals ofsink_parquet
, the error would only be caught at runtime.That is why I decided to keep it separate in this PR. Unifying the APIs is definitely interesting and maybe this refactor might be worth considering in a future version, but I did not want to rush this for these reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I keep forgetting Rust doesn't have default parameters... Pity because here it would be so much more helpful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR enable us to write LazyFrames to cloud storage using sink_parquet? If so, how do I so this? I can't figure it out.