From dfe62d6c97673ec57f71aa675cf0d68c47d04a77 Mon Sep 17 00:00:00 2001 From: Edgar Ruiz Date: Mon, 22 Apr 2024 17:33:47 -0500 Subject: [PATCH] Starts sparkxgb section --- .../sparklyr-updates.Rmd | 48 +++++++++++++------ 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/_posts/2024-04-22-sparklyr-updates/sparklyr-updates.Rmd b/_posts/2024-04-22-sparklyr-updates/sparklyr-updates.Rmd index 0a6f2244..d72206ee 100644 --- a/_posts/2024-04-22-sparklyr-updates/sparklyr-updates.Rmd +++ b/_posts/2024-04-22-sparklyr-updates/sparklyr-updates.Rmd @@ -28,6 +28,9 @@ knitr::opts_chunk$set( ## Highlights +`sparklyr` and friends have been getting some important updates in the past few +months, here are some highlights: + * Databricks Connect v2 now supports running native R code in Spark, via `pysparklyr`. @@ -67,27 +70,25 @@ for the next time you run the call. A full article about this new capability is available here: [Run R inside Databricks Connect](https://spark.posit.co/deployment/databricks-connect-udfs.html) -## sparklyr 1.8.5 - -### Fixes - -- Fixes quoting issue with `dbplyr` 2.5.0 (#3429) - -- Fixes Windows OS identification (#3426) +## sparkxgb -### Package improvements +The `sparkxgb` is an extension of `sparklyr`. It enables integration with +[XGBoost](https://xgboost.readthedocs.io/en/stable/). The current CRAN release +does not support the more recent versions of XGBoost. This limitation has recently +prompted a full refresh of `sparkxgb`. Here is a summary of the improvements, +currently in the [development version of the package](https://github.com/rstudio/sparkxgb): -- Removes dependency on `tibble`, all calls are now redirected to `dplyr` (#3399) +- The `xgboost_classifier()` and `xgboost_regressor()` functions no longer +pass values from two existing arguments: -- Removes dependency on `rapddirs` (#3401): - - Backwards compatibility with `sparklyr` 0.5 is no longer needed - - Replicates selection of cache directory + - `sketch_eps` + - `timeout_request_workers` -- Converts `spark_apply()` to a method (#3418) +These are parameters that XGBoost has deprecated. This is the main reason why +R users are seeing errors when using the version currently on CRAN. -## sparkxgb -- Avoids sending two deprecated parameters to XGBoost. The default arguments in +Avoids sending two deprecated parameters to XGBoost. The default arguments in the R function are NULL, and it will return an error message if the call intends to use them: @@ -118,3 +119,20 @@ Spark session package moving forward. +## sparklyr 1.8.5 + +### Fixes + +- Fixes quoting issue with `dbplyr` 2.5.0 (#3429) + +- Fixes Windows OS identification (#3426) + +### Package improvements + +- Removes dependency on `tibble`, all calls are now redirected to `dplyr` (#3399) + +- Removes dependency on `rapddirs` (#3401): + - Backwards compatibility with `sparklyr` 0.5 is no longer needed + - Replicates selection of cache directory + +- Converts `spark_apply()` to a method (#3418) \ No newline at end of file