You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
kgutwin opened this issue
Sep 24, 2024
· 2 comments
· Fixed by #18913
Assignees
Labels
A-planArea: logical plan and intermediate representationacceptedReady for implementationbugSomething isn't workingP-mediumPriority: mediumpythonRelated to Python PolarsregressionIssue introduced by a new release
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
importioimportpolarsasplcsvf="a,b\n1,10\n2,20\n3,30"lf=pl.scan_csv(io.BytesIO(csvf.encode()))
lf.collect() # necessary to trigger the bugprint(lf.with_row_index().collect())
importioimportpolarsasplcsvf="a,b\n1,10\n2,20\n3,30"lf=pl.scan_csv(io.BytesIO(csvf.encode()))
lf.with_row_index().collect() # necessary to trigger the bugprint(lf.collect())
read files in parallel
file < 128 rows, no statistics determined
no. of chunks: 1 processed by: 1 threads.
Issue description
When using with_row_index() on a LazyFrame derived from scan_csv(), making a call to collect() seems to "freeze" the LazyFrame in that state and no further calls to with_row_index() have the desired effect.
If you create the LazyFrame via scan_csv() and then immediately call lf.collect(), a subsequent call to lf.with_row_index() will not add the index column.
If you create the lazy frame and then immediately call lf.with_row_index(), the LazyFrame will always have the index column, even if you call lf.collect() on the original LazyFrame reference.
This was only tested to be found with LazyFrames from CSV files; a LazyFrame created from a dict will not show this behavior.
A-planArea: logical plan and intermediate representationacceptedReady for implementationbugSomething isn't workingP-mediumPriority: mediumpythonRelated to Python PolarsregressionIssue introduced by a new release
Checks
Reproducible example
Output:
Alternative misbehavior:
Output:
This output has the row index, but it shouldn't.
Log output
Issue description
When using
with_row_index()
on a LazyFrame derived fromscan_csv()
, making a call tocollect()
seems to "freeze" the LazyFrame in that state and no further calls towith_row_index()
have the desired effect.scan_csv()
and then immediately calllf.collect()
, a subsequent call tolf.with_row_index()
will not add the index column.lf.with_row_index()
, the LazyFrame will always have the index column, even if you calllf.collect()
on the original LazyFrame reference.This was only tested to be found with LazyFrames from CSV files; a LazyFrame created from a dict will not show this behavior.
This behavior is not present in polars 1.7.1.
Expected behavior
Installed versions
The text was updated successfully, but these errors were encountered: