-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter pushdown discrepancy between duckdb and duckplyr #172
Comments
I enountered a similar thing where I tried |
Would be good to figure out what's going wrong here. |
I had a look at the problem. To me it seems that the issue arises from the fact with Duckplyr the expression This can be observed from above query plan as the Duckplyr FILTER-operation filters on this expression: If my debugging is correct, the pushdown operation here is handled ultimately by DuckDB's optimizer in FilterResult FilterCombiner::AddFilter(Expression &expr) . The function pushes down expression which are either foldable or of expression class From DuckDB's perspective the expression I don't have a solution at hand, but I'll look into the possibility of interpreting |
Thanks for the investigation. I wonder if we somehow can label 2 > TRUE
#> [1] TRUE Created on 2024-09-28 with reprex v2.1.0 |
I'm not sure if we can label the expression If I've understood correctly, we can only pass the following expression types via the duckdb relational api atm
To enable pushdown, we would need a comparison expression. And to ensure we can fallback on the edge cases @krlmlr mentioned, we would need type information to make a comparison operator only when the types are indeed comparable by duckdb. So the logic would be:
I though about altering the expression type within rapi_rel_filter, but we don't have type information at that point (ParsedExpressions do not contain types). The only idea I have is to introduce a ComparisonExpression to the relation api (ie. rapi_expr_comparison) to match duckdb's expression types. This would enable us to make comparison expressions when translating into duckdb expressions, but idk if there's a easier way or this conflicts some other design choices. |
I added a sketch of the approach outlined above (using comparison expressions). The draft PR's translate What do you think about this approach @krlmlr ? |
Thanks, nice! I need to take a closer look here. |
This issue is related to the discussion in this comment:
#145 (comment)
The following code demonstrates the difference in filter
pushdown between DuckDB and Duckplyr as shown by the
explain()
function.
With Duckplyr, the filter does not appear to be pushed down to the
Parquet files.
As a result,
duckplyr
is slightly slower compared toduckdb
.Created on 2024-05-21 with reprex v2.1.0
Session info
The text was updated successfully, but these errors were encountered: