diff --git a/README.Rmd b/README.Rmd index 5ea54957..4db6da8d 100644 --- a/README.Rmd +++ b/README.Rmd @@ -24,12 +24,36 @@ set.seed(20230702) [![R-CMD-check](https://github.com/duckdblabs/duckplyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/duckdblabs/duckplyr/actions/workflows/R-CMD-check.yaml) -The goal of duckplyr is to provide a drop-in replacement for dplyr that uses DuckDB as a backend for fast operation. -It also defines a set of generics that provide a low-level implementer's interface for dplyr's high-level user interface. +The goal of duckplyr is to provide a drop-in replacement for dplyr that uses [DuckDB](https://duckdb.org/) as a backend for fast operation. +DuckDB is an in-process SQL OLAP database management system. -## Example +duckplyr also defines a set of generics that provide a low-level implementer's interface for dplyr's high-level user interface. -```{r load} +## Installation + +Install duckplyr from CRAN with: + +``` r +install.packages("duckplyr") +``` + +You can also install the development version of duckplyr from R-universe: + +``` r +install.packages('duckplyr', repos = c('https://duckdblabs.r-universe.dev', 'https://cloud.r-project.org')) +``` + +Or from [GitHub](https://github.com/) with: + +``` r +# install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/stable/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch)) +pak::pak("duckdblabs/duckplyr") +``` + + +## Examples + +```{r attach} library(conflicted) library(duckplyr) conflict_prefer("filter", "duckplyr") @@ -37,18 +61,19 @@ conflict_prefer("filter", "duckplyr") There are two ways to use duckplyr. -1. To enable for individual data frames, use `as_duckplyr_df()` as the first step in your pipe. -1. To enable for the entire session, use `methods_overwrite()`. +1. To enable duckplyr for individual data frames, use `as_duckplyr_df()` as the first step in your pipe. +1. To enable duckplyr for the entire session, use `methods_overwrite()`. The examples below illustrate both methods. See also the companion [demo repository](https://github.com/Tmonster/duckplyr_demo) for a use case with a large dataset. -### Individual +### Usage for individual data frames This example illustrates usage of duckplyr for individual data frames. -```{r individual} -# Use `as_duckplyr_df()` to enable processing with duckdb: +Use `as_duckplyr_df()` to enable processing with duckdb: + +```{r} out <- palmerpenguins::penguins %>% # CAVEAT: factor columns are not supported yet @@ -57,35 +82,49 @@ out <- mutate(bill_area = bill_length_mm * bill_depth_mm) %>% summarize(.by = c(species, sex), mean_bill_area = mean(bill_area)) %>% filter(species != "Gentoo") +``` + +The result is a data frame or tibble, with its own class. -# The result is a data frame or tibble, with its own class. +```{r} class(out) names(out) +``` + +duckdb is responsible for eventually carrying out the operations. +Despite the late filter, the summary is not computed for the Gentoo species. -# duckdb is responsible for eventually carrying out the operations. -# Despite the late filter, the summary is not computed for the Gentoo species. +```{r} out %>% explain() +``` -# All data frame operations are supported. -# Computation happens upon the first request. +All data frame operations are supported. +Computation happens upon the first request. + +```{r} out$mean_bill_area +``` -# After the computation has been carried out, the results are available -# immediately: +After the computation has been carried out, the results are available immediately: + +```{r} out ``` -### Session-wide +### Session-wide usage This example illustrates usage of duckplyr for all data frames in the R session. -```{r session} -# Use `methods_overwrite()` to enable processing with duckdb for all data frames: +Use `methods_overwrite()` to enable processing with duckdb for all data frames: + +```{r} methods_overwrite() +``` +This is the same query as above, without `as_duckplyr_df()`: -# This is the same query as above, without `as_duckplyr_df()`: +```{r} out <- palmerpenguins::penguins %>% # CAVEAT: factor columns are not supported yet @@ -93,17 +132,29 @@ out <- mutate(bill_area = bill_length_mm * bill_depth_mm) %>% summarize(.by = c(species, sex), mean_bill_area = mean(bill_area)) %>% filter(species != "Gentoo") +``` + +The result is a plain tibble now: -# The result is a plain tibble now: +```{r} class(out) +``` + +Querying the number of rows also starts the computation: -# Querying the number of rows also starts the computation: +```{r} nrow(out) +``` -# Restart R, or call `methods_restore()` to revert to the default dplyr implementation. +Restart R, or call `methods_restore()` to revert to the default dplyr implementation. + +```{r} methods_restore() +``` + +dplyr is active again: -# dplyr is active again: +```{r} palmerpenguins::penguins %>% # CAVEAT: factor columns are not supported yet mutate(across(where(is.factor), as.character)) %>% @@ -211,17 +262,3 @@ rel_names.dfrel <- function(rel, ...) { rel_names(mtcars_rel) ``` -## Installation - -Install duckplyr from CRAN with: - -``` r -install.packages("duckplyr") -``` - -You can also install the development version of duckplyr from [GitHub](https://github.com/) with: - -``` r -# install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/stable/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch)) -pak::pak("duckdblabs/duckplyr") -``` diff --git a/README.md b/README.md index c2936ff2..b965034f 100644 --- a/README.md +++ b/README.md @@ -8,9 +8,29 @@ -The goal of duckplyr is to provide a drop-in replacement for dplyr that uses DuckDB as a backend for fast operation. It also defines a set of generics that provide a low-level implementer’s interface for dplyr’s high-level user interface. +The goal of duckplyr is to provide a drop-in replacement for dplyr that uses [DuckDB](https://duckdb.org/) as a backend for fast operation. DuckDB is an in-process SQL OLAP database management system. -## Example +duckplyr also defines a set of generics that provide a low-level implementer’s interface for dplyr’s high-level user interface. + +## Installation + +Install duckplyr from CRAN with: + +
+install.packages("duckplyr")
+ +You can also install the development version of duckplyr from R-universe: + +
+install.packages('duckplyr', repos = c('https://duckdblabs.r-universe.dev', 'https://cloud.r-project.org'))
+ +Or from [GitHub](https://github.com/) with: + +
+# install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/stable/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch))
+pak::pak("duckdblabs/duckplyr")
+ +## Examples
 library(conflicted)
@@ -21,17 +41,18 @@ The goal of duckplyr is to provide a drop-in replacement for dplyr that uses Duc
 
 There are two ways to use duckplyr.
 
-1.  To enable for individual data frames, use [`as_duckplyr_df()`](https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_df.html) as the first step in your pipe.
-2.  To enable for the entire session, use [`methods_overwrite()`](https://duckdblabs.github.io/duckplyr/reference/methods_overwrite.html).
+1.  To enable duckplyr for individual data frames, use [`as_duckplyr_df()`](https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_df.html) as the first step in your pipe.
+2.  To enable duckplyr for the entire session, use [`methods_overwrite()`](https://duckdblabs.github.io/duckplyr/reference/methods_overwrite.html).
 
 The examples below illustrate both methods. See also the companion [demo repository](https://github.com/Tmonster/duckplyr_demo) for a use case with a large dataset.
 
-### Individual
+### Usage for individual data frames
 
 This example illustrates usage of duckplyr for individual data frames.
 
+Use [`as_duckplyr_df()`](https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_df.html) to enable processing with duckdb:
+
 
-# Use `as_duckplyr_df()` to enable processing with duckdb:
 out <-
   palmerpenguins::penguins %>%
   # CAVEAT: factor columns are not supported yet
@@ -39,16 +60,19 @@ This example illustrates usage of duckplyr for individual data frames.
   as_duckplyr_df() %>%
   mutate(bill_area = bill_length_mm * bill_depth_mm) %>%
   summarize(.by = c(species, sex), mean_bill_area = mean(bill_area)) %>%
-  filter(species != "Gentoo")
-
-# The result is a data frame or tibble, with its own class.
+  filter(species != "Gentoo")
+ +The result is a data frame or tibble, with its own class. + +
 class(out)
 #> [1] "duckplyr_df" "tbl_df"      "tbl"         "data.frame"
 names(out)
-#> [1] "species"        "sex"            "mean_bill_area"
-
-# duckdb is responsible for eventually carrying out the operations.
-# Despite the late filter, the summary is not computed for the Gentoo species.
+#> [1] "species"        "sex"            "mean_bill_area"
+ +duckdb is responsible for eventually carrying out the operations. Despite the late filter, the summary is not computed for the Gentoo species. + +
 out %>%
   explain()
 #> ┌───────────────────────────┐
@@ -77,7 +101,7 @@ This example illustrates usage of duckplyr for individual data frames.
 #> │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
 #> │   (species != 'Gentoo')   │
 #> │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
-#> │           EC: 0           │
+#> │          EC: 344          │
 #> └─────────────┬─────────────┘                             
 #> ┌─────────────┴─────────────┐
 #> │     R_DATAFRAME_SCAN      │
@@ -89,11 +113,12 @@ This example illustrates usage of duckplyr for individual data frames.
 #> │       bill_depth_mm       │
 #> │            sex            │
 #> │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
-#> │           EC: 0           │
-#> └───────────────────────────┘
-
-# All data frame operations are supported.
-# Computation happens upon the first request.
+#> │          EC: 344          │
+#> └───────────────────────────┘
+ +All data frame operations are supported. Computation happens upon the first request. + +
 out$mean_bill_area
 #> materializing:
 #> ---------------------
@@ -102,7 +127,7 @@ This example illustrates usage of duckplyr for individual data frames.
 #> Filter [!=(species, 'Gentoo')]
 #>   Aggregate [species, sex, mean(bill_area)]
 #>     Projection [species as species, island as island, bill_length_mm as bill_length_mm, bill_depth_mm as bill_depth_mm, flipper_length_mm as flipper_length_mm, body_mass_g as body_mass_g, sex as sex, "year" as year, *(bill_length_mm, bill_depth_mm) as bill_area]
-#>       r_dataframe_scan(0x107a45b78)
+#>       r_dataframe_scan(0x10c4c7628)
 #> 
 #> ---------------------
 #> -- Result Columns  --
@@ -111,29 +136,33 @@ This example illustrates usage of duckplyr for individual data frames.
 #> - sex (VARCHAR)
 #> - mean_bill_area (DOUBLE)
 #> 
-#> [1] 770.2627 656.8523 694.9360 819.7503 984.2279
-
-# After the computation has been carried out, the results are available
-# immediately:
+#> [1] 770.2627 656.8523 819.7503 694.9360 984.2279
+ +After the computation has been carried out, the results are available immediately: + +
 out
 #> # A tibble: 5 × 3
 #>   species   sex    mean_bill_area
 #>   <chr>     <chr>           <dbl>
 #> 1 Adelie    male             770.
 #> 2 Adelie    female           657.
-#> 3 Adelie    NA               695.
-#> 4 Chinstrap female           820.
+#> 3 Chinstrap female           820.
+#> 4 Adelie    NA               695.
 #> 5 Chinstrap male             984.
-### Session-wide +### Session-wide usage This example illustrates usage of duckplyr for all data frames in the R session. +Use [`methods_overwrite()`](https://duckdblabs.github.io/duckplyr/reference/methods_overwrite.html) to enable processing with duckdb for all data frames: + +
+methods_overwrite()
+ +This is the same query as above, without [`as_duckplyr_df()`](https://duckdblabs.github.io/duckplyr/reference/as_duckplyr_df.html): +
-# Use `methods_overwrite()` to enable processing with duckdb for all data frames:
-methods_overwrite()
-
-# This is the same query as above, without `as_duckplyr_df()`:
 out <-
   palmerpenguins::penguins %>%
   # CAVEAT: factor columns are not supported yet
@@ -141,12 +170,19 @@ This example illustrates usage of duckplyr for all data frames in the R session.
   mutate(bill_area = bill_length_mm * bill_depth_mm) %>%
   summarize(.by = c(species, sex), mean_bill_area = mean(bill_area)) %>%
   filter(species != "Gentoo")
-
-# The result is a plain tibble now:
+#> Error processing with relational.
+#> Caused by error in `duckdb_rel_from_df()`:
+#> ! Can't convert factor columns to relational. Affected column: `species`.
+ +The result is a plain tibble now: + +
 class(out)
-#> [1] "tbl_df"     "tbl"        "data.frame"
-
-# Querying the number of rows also starts the computation:
+#> [1] "tbl_df"     "tbl"        "data.frame"
+ +Querying the number of rows also starts the computation: + +
 nrow(out)
 #> materializing:
 #> ---------------------
@@ -155,7 +191,7 @@ This example illustrates usage of duckplyr for all data frames in the R session.
 #> Filter [!=(species, 'Gentoo')]
 #>   Aggregate [species, sex, mean(bill_area)]
 #>     Projection [species as species, island as island, bill_length_mm as bill_length_mm, bill_depth_mm as bill_depth_mm, flipper_length_mm as flipper_length_mm, body_mass_g as body_mass_g, sex as sex, "year" as year, *(bill_length_mm, bill_depth_mm) as bill_area]
-#>       r_dataframe_scan(0x1254b4b58)
+#>       r_dataframe_scan(0x10a81d568)
 #> 
 #> ---------------------
 #> -- Result Columns  --
@@ -163,12 +199,16 @@ This example illustrates usage of duckplyr for all data frames in the R session.
 #> - species (VARCHAR)
 #> - sex (VARCHAR)
 #> - mean_bill_area (DOUBLE)
-#> [1] 5
-
-# Restart R, or call `methods_restore()` to revert to the default dplyr implementation.
-methods_restore()
-
-# dplyr is active again:
+#> [1] 5
+ +Restart R, or call [`methods_restore()`](https://duckdblabs.github.io/duckplyr/reference/methods_overwrite.html) to revert to the default dplyr implementation. + +
+methods_restore()
+ +dplyr is active again: + +
 palmerpenguins::penguins %>%
   # CAVEAT: factor columns are not supported yet
   mutate(across(where(is.factor), as.character)) %>%
@@ -339,16 +379,3 @@ This package also provides generics, for which other packages may then implement
 
 rel_names(mtcars_rel)
 #> [1] "mpg"  "cyl"  "disp" "hp"
- -## Installation - -Install duckplyr from CRAN with: - -
-install.packages("duckplyr")
- -You can also install the development version of duckplyr from [GitHub](https://github.com/) with: - -
-# install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/stable/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch))
-pak::pak("duckdblabs/duckplyr")