Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to sink lazy datasets to STDOUT or to files #18834

Open
aaronsteers opened this issue Sep 20, 2024 · 0 comments
Open

Ability to sink lazy datasets to STDOUT or to files #18834

aaronsteers opened this issue Sep 20, 2024 · 0 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@aaronsteers
Copy link

aaronsteers commented Sep 20, 2024

Description

I am reading in a massive jsonl file (several GB) - and I want to send a transformed version of it to STDOUT (or to another file buffer). It is larger-than memory, so I'm hoping to use the lazy methods to process it.

It doesn't look like this is possible today. The write_*() methods explicitly accept a file-like object, and I found previous issues in the repo discussing how those can be used to send data to STDOUT. However, the sink_*() methods expect str | Path, without an allowance or acceptance for a file-like object to write to.

Is it possible that the sink_*() methods could also support file-like outputs?

Update:

@aaronsteers aaronsteers added the enhancement New feature or an improvement of an existing feature label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant