-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust): Adds sink_csv_cloud
to LazyFrame
#11037
Conversation
…ions` and `ParquetWriteOptions`
sink_csv_cloud
to LazyFrame
In order to better-group the cloud methods, how about
|
I don't really have any preference either way. @Qqwy would be doing a refactoring after his ipc-cloud method was added in PR #11008 . So I would be taking it up in that refactoring, wdyt? Furthermore, I also invite you to join the discussion I started in #11056. |
# Conflicts: # crates/polars-pipe/src/executors/sinks/file_sink.rs
Closing because no movement for a month now. |
I may be wrong here but would sink csv cloud yield any benefit? csv is row-based, which means you need to execute and load the data frame in memory anyway, and then write the rows. |
Obviously. CSV is record based. So as soon as you have a record, you can write it away and forget it again... So sinking is an obvious choice for that kind of file formats. |
Yeah, I poorly phrased my question. :) You can optimize the writing of the rows but I assume there is not much optimization we can push to the query itself, compared to a columnar format that would allow us to optimize the computation and streaming of each individual column. |
In any case, it probably makes sense to enable cloud operations to all formats, even if only for the convenience of skipping the local write to disk. |
True, with a columnar format you can optimize reading. Anyhow in my case I read in a parquet file and wrote it out as a csv file. So for these kind of operations sinking is useful. As for enabling all cloud operations to all formats: I'm waiting for PR #10786 to be merged in order to investigate a single-source of file reader/writer where any format plugs into. So whatever new format arrives, it can immediately benefit from the full scope of reading/writing capabilities in the rust engine. |
@alexander-beedie To play devil's advocate on the name, people usually know the format they want to work with, so I think having the format come first is more helpful. This way they can type |
In analogy with PR #11008 & PR #10060: adding sink_csv_cloud in rust.
As I don't have the rights for running rust builds on my computer here, I am opening this up as a draft until all checks check out.