-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for IO[bytes]
and bytes
in scan_{...}
functions
#18532
Conversation
06161e2
to
1328ea6
Compare
separator: u8, | ||
quote_char: Option<u8>, | ||
comment_prefix: Option<&CommentPrefix>, | ||
eol_char: u8, | ||
has_header: bool, | ||
) -> PolarsResult<usize> { | ||
let file = if is_cloud_url(path) || config::force_async() { | ||
#[cfg(feature = "cloud")] | ||
{ | ||
feature_gated!("cloud", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: this is the preferred way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, much nicer.
let file_options = FileScanOptions { | ||
slice: self.n_rows.map(|x| (0, x)), | ||
with_columns: None, | ||
cache: false, | ||
row_index: self.row_index, | ||
rechunk: self.rechunk, | ||
file_counter: 0, | ||
hive_options: Default::default(), | ||
hive_options: HiveOptions { | ||
enabled: Some(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HiveOptions::default sets enabled to Some(true)
this is problematic. So I changed it here.
|
||
let memslice = match source { | ||
ScanSourceRef::File(path) => { | ||
let file = if run_async { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: feature_gated
is preferred
|
||
#[cfg(feature = "cloud")] | ||
{ | ||
feature_gated!("cloud", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: feature_gated
is preferred
@@ -11,7 +10,7 @@ use rayon::prelude::*; | |||
use super::*; | |||
|
|||
pub struct IpcExec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We remove the options here, basically because the memory map options do not do anything anymore
/// # Notes | ||
/// | ||
/// - Scan sources with in-memory buffers are ignored. | ||
pub(crate) fn agg_source_paths<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: remove the allocation here
Putting this back into draft for a second. I want to try to remove the specialized |
fbdbe4c
to
0b49b13
Compare
BytesIO
in scan_{...}
functionsIO[bytes]
and bytes
in scan_{...}
functions
In the end I chose not to do this, because it would require significant mappings. |
@@ -339,29 +340,3 @@ def test_ipc_decimal_15920( | |||
path = f"{tmp_path}/data" | |||
df.write_ipc(path) | |||
assert_frame_equal(pl.read_ipc(path), df) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this test as it does not really make much sense. The mmaps are unmapped when they are finished, I feel like a bug was triggering this behavior before. For the streaming engine it does make a lot of sense to have a test though and the checks for this are still in place.
This solves one of the subgoals of #13040. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18532 +/- ##
==========================================
- Coverage 79.93% 79.89% -0.04%
==========================================
Files 1505 1506 +1
Lines 202628 203040 +412
Branches 2873 2889 +16
==========================================
+ Hits 161976 162226 +250
- Misses 40104 40264 +160
- Partials 548 550 +2 ☔ View full report in Codecov by Sentry. |
In the end, a lot more things had to be touched than I thought originally. @ritchie46 sorry for the large review 😓 |
This PR adds support for the
bytes
andIO[bytes]
interface to allscan_{parquet, csv, ...}
functions.Since the same code was being touched. This PR also fixes #18581.
Fixes #4950.
Fixes #12617.