Skip to content

Commit

Permalink
Starts sparkxgb section
Browse files Browse the repository at this point in the history
  • Loading branch information
edgararuiz committed Apr 22, 2024
1 parent 484ba0b commit dfe62d6
Showing 1 changed file with 33 additions and 15 deletions.
48 changes: 33 additions & 15 deletions _posts/2024-04-22-sparklyr-updates/sparklyr-updates.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ knitr::opts_chunk$set(

## Highlights

`sparklyr` and friends have been getting some important updates in the past few
months, here are some highlights:

* Databricks Connect v2 now supports running native R code in Spark, via
`pysparklyr`.

Expand Down Expand Up @@ -67,27 +70,25 @@ for the next time you run the call.
A full article about this new capability is available here:
[Run R inside Databricks Connect](https://spark.posit.co/deployment/databricks-connect-udfs.html)

## sparklyr 1.8.5

### Fixes

- Fixes quoting issue with `dbplyr` 2.5.0 (#3429)

- Fixes Windows OS identification (#3426)
## sparkxgb

### Package improvements
The `sparkxgb` is an extension of `sparklyr`. It enables integration with
[XGBoost](https://xgboost.readthedocs.io/en/stable/). The current CRAN release
does not support the more recent versions of XGBoost. This limitation has recently
prompted a full refresh of `sparkxgb`. Here is a summary of the improvements,
currently in the [development version of the package](https://github.com/rstudio/sparkxgb):

- Removes dependency on `tibble`, all calls are now redirected to `dplyr` (#3399)
- The `xgboost_classifier()` and `xgboost_regressor()` functions no longer
pass values from two existing arguments:

- Removes dependency on `rapddirs` (#3401):
- Backwards compatibility with `sparklyr` 0.5 is no longer needed
- Replicates selection of cache directory
- `sketch_eps`
- `timeout_request_workers`

- Converts `spark_apply()` to a method (#3418)
These are parameters that XGBoost has deprecated. This is the main reason why
R users are seeing errors when using the version currently on CRAN.

## sparkxgb

- Avoids sending two deprecated parameters to XGBoost. The default arguments in
Avoids sending two deprecated parameters to XGBoost. The default arguments in
the R function are NULL, and it will return an error message if the call intends
to use them:

Expand Down Expand Up @@ -118,3 +119,20 @@ Spark session
package moving forward.


## sparklyr 1.8.5

### Fixes

- Fixes quoting issue with `dbplyr` 2.5.0 (#3429)

- Fixes Windows OS identification (#3426)

### Package improvements

- Removes dependency on `tibble`, all calls are now redirected to `dplyr` (#3399)

- Removes dependency on `rapddirs` (#3401):
- Backwards compatibility with `sparklyr` 0.5 is no longer needed
- Replicates selection of cache directory

- Converts `spark_apply()` to a method (#3418)

1 comment on commit dfe62d6

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.