Filtering with pl.col is substantially (27x) slower than filtering with pl.Series #18833
Open
2 tasks done
Labels
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
I have been unable to reproduce this example without using my custom data. Below is a dataset that looks like my existing data (same typings) but it not the same. I have been unable to replicate the issue with this fake data.
Log output
No response
Issue description
Using data.filter(data["col_727"] == 1) to filter this column executes in 0.5s. However, filtering with data.filter(pl.col("col_727") == 1) executes in 13.6s. The column is a float64 column with values 0.0 and 1.0 and no null values. The ratio of 0.0 to 1.0 is 2:1 as in the fake data I've shared. What might be causing this sizable discrepancy? Could it be that polars is not properly distributing compute across cpus with pl.col?
Expected behavior
Execution times are the same or very similar.
Installed versions
The text was updated successfully, but these errors were encountered: