You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@hadley: reusing summary variables may be a mistake, more often than not. This is not handled in duckplyr yet. Should we implement the dangerous behavior or find a better way in dplyr? We could require the reuse of a computed summary variable to use an adverb, for example:
options(conflicts.policy=list(warn=FALSE))
library(duckplyr)
data<- tibble(a= c(1L, 1:2), b=1:3, c=4:6)
data#> # A tibble: 3 × 3#> a b c#> <int> <int> <int>#> 1 1 1 4#> 2 1 2 5#> 3 2 3 6data|>
summarize(
.by=a,
b= sum(b),
c= sum(b*c),
)
#> # A tibble: 2 × 3#> a b c#> <int> <int> <int>#> 1 1 3 27#> 2 2 3 18data|>
as_duckplyr_df() |>
summarize(
.by=a,
b= sum(b),
c= sum(b*c),
)
#> materializing:#> ---------------------#> --- Relation Tree ---#> ---------------------#> Aggregate [a, sum(b), sum(*(b, c))]#> r_dataframe_scan(0x12cbe0ec8)#> #> ---------------------#> -- Result Columns --#> ---------------------#> - a (INTEGER)#> - b (HUGEINT)#> - c (HUGEINT)#> #> # A tibble: 2 × 3#> a b c#> <int> <dbl> <dbl>#> 1 2 3 18#> 2 1 3 14
sum(1:2*4:5)
#> [1] 14
sum(sum(1:2) *4:5)
#> [1] 27
I don't think we want to make any big design changes based on this unusual edge case. I would suggest duckdplyr warns when it encounters this situation and then we can use information from that to determine if we can pull it from dplyr.
@hadley: reusing summary variables may be a mistake, more often than not. This is not handled in duckplyr yet. Should we implement the dangerous behavior or find a better way in dplyr? We could require the reuse of a computed summary variable to use an adverb, for example:
Created on 2023-11-09 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: