tidier
package provides ‘Apache Spark’
style window aggregation for R dataframes and remote dbplyr
tbls via
‘mutate’ in
‘dplyr’ flavour.
Create a new column with average temp over last seven days in the same month.
set.seed(101)
airquality |>
# create date column
dplyr::mutate(date_col = lubridate::make_date(1973, Month, Day)) |>
# create gaps by removing some days
dplyr::slice_sample(prop = 0.8) |>
# compute mean temperature over last seven days in the same month
tidier::mutate(avg_temp_over_last_week = mean(Temp, na.rm = TRUE),
.order_by = Day,
.by = Month,
.frame = c(lubridate::days(7), # 7 days before current row
lubridate::days(-1) # do not include current row
),
.index = date_col
)
#> # A tibble: 122 × 8
#> Month Ozone Solar.R Wind Temp Day date_col avg_temp_over_last_week
#> <int> <int> <int> <dbl> <int> <int> <date> <dbl>
#> 1 6 NA 286 8.6 78 1 1973-06-01 NaN
#> 2 6 NA 242 16.1 67 3 1973-06-03 78
#> 3 6 NA 186 9.2 84 4 1973-06-04 72.5
#> 4 6 NA 264 14.3 79 6 1973-06-06 76.3
#> 5 6 29 127 9.7 82 7 1973-06-07 77
#> 6 6 NA 273 6.9 87 8 1973-06-08 78
#> 7 6 NA 259 10.9 93 11 1973-06-11 83
#> 8 6 NA 250 9.2 92 12 1973-06-12 85.2
#> 9 6 23 148 8 82 13 1973-06-13 86.6
#> 10 6 NA 332 13.8 80 14 1973-06-14 87.2
#> # ℹ 112 more rows
mutate
supports.by
(group by),.order_by
(order by),.frame
(endpoints of window frame),.index
(identify index column like date column, in df version only),.complete
(whether to compute over incomplete window, in df version only).
mutate
automatically uses a future backend (viafurrr
, in df version only).
This implementation is inspired by Apache Spark’s
windowSpec
class with
rangeBetween
and
rowsBetween
.
-
dbplyr
implements this viadbplyr::win_over
enablingsparklyr
users to write window computations. Also see,dbplyr::window_order
/dbplyr::window_frame
.tidier
’smutate
wraps this functionality via uniform syntax across dataframes and remote tbls. -
tidypyspark
python package implementsmutate
style window computation API for pyspark.
- dev:
remotes::install_github("talegari/tidier")
- cran:
install.packages("tidier")
tidier
package is deeply indebted to three amazing packages and people
behind it.
Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A
Grammar of Data Manipulation_. R package version 1.1.0,
<https://CRAN.R-project.org/package=dplyr>.
Vaughan D (2021). _slider: Sliding Window Functions_. R package
version 0.2.2, <https://CRAN.R-project.org/package=slider>.
Wickham H, Girlich M, Ruiz E (2023). _dbplyr: A 'dplyr' Back End
for Databases_. R package version 2.3.2,
<https://CRAN.R-project.org/package=dbplyr>.