-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Scan_pyarrow_dataset() no longer pushes down partition filtering to pyarrow #14343
Comments
you can try this in the meantime: >>> print(
tr.filter((pl.col('CALENDAR_DATE') >= (pl.lit('2023-07-21'))) &
(pl.col('CALENDAR_DATE') <= (pl.lit('2024-01-22')))).explain(optimized=True)
)
PYTHON SCAN
PROJECT */4 COLUMNS
SELECTION: ((pa.compute.field('CALENDAR_DATE') >= '2023-07-21') & (pa.compute.field('CALENDAR_DATE') <= '2024-01-22')) |
Can we get a bisect on this one? Which commit introduced the regression? |
@imocsi did you see that 0.20.7 has the is_in fix for scan_parquet? |
Yes, I did see it. Though, the codes I wrote use scan_pyarrow_dataset('myfolder'), so I'll wait for either:
|
My guess is this: |
It seems, that in polars==0.20.3 the expression |
@lmocsi is this resolved? |
Yes. As of polars==1.5.0 it is working fine (maybe with earlier versions as well). |
Great! |
Checks
Reproducible example
Log output
Issue description
As of polars==0.20.4 scan_pyarrow_dataset() no longer pushes down partition filtering to pyarrow. In polars==0.20.3 it was working fine, as it was described in this issue: #13908
Polars==0.20.3 + scan_pyarrow_dataset:
Polars==0.20.6 + scan_pyarrow_dataset:
Expected behavior
Push down partition filtering to pyarrow.
Installed versions
The text was updated successfully, but these errors were encountered: