Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for updating the R container #78

Open
rvalieris opened this issue May 18, 2024 · 2 comments
Open

Proposal for updating the R container #78

rvalieris opened this issue May 18, 2024 · 2 comments
Assignees

Comments

@rvalieris
Copy link

Hello there, I'm interested in helping update the R container,
after spending some time working on the build errors I have written a patch,
but I want to discuss the changes before opening a PR because they are significant:

  • Upgrade R to 4.4
    the latest R available on Kaggle is 3 years old now, it's about time we get an update ;)

  • Drop mxnet
    as of 2023-09, the project has been retired.

  • Remove packages: rgdal, rgeos
    these packages have been removed from CRAN:
    Archived on 2023-10-16 at the request of the maintainer. Consider using 'sf' or 'terra' instead.

  • Install the latest versions of the packages: randomForest, terra, ranger, imager
    the version of these packages is fixed on package_installs.R,
    the comments imply there is context for these ( internal bug tracker ?), but it's not clear why or if they are still needed.

  • Use pak to install packages
    The build is currently failling here:

r$> utils:::.make_dependency_list(pkgs, allPackages, recursive = TRUE)
Error in tools:::.extract_dependency_package_names(x) :
  non-character argument

r$> traceback()
4: tools:::.extract_dependency_package_names(x)
3: unique.default(tools:::.extract_dependency_package_names(x))
2: .clean_up_dependencies(info2[i, ])
1: utils:::.make_dependency_list(pkgs, allPackages, recursive = TRUE)

package_installs.R uses utils:::.make_dependency_list,
this is an internal function and is prone to changing its behavior without warning,
which is what I think happened here.

pak can do parallel downloads and parallel installation of packages, among other features.

I think that by using pak most of the package_installs.R code becomes redundant, and the script can be simplified significantly.

of course, none of these are required but it is what I think would be better, let me know what you think.

@calderjo calderjo self-assigned this Jun 5, 2024
@calderjo
Copy link
Contributor

calderjo commented Jun 5, 2024

Hi rvalieris, thank you for reaching out and starting this conversation. With all the cool things going on in the python world, we most definitely left r image unattended. I starting to gain ownership of both python and r docker images, so I'm looking forward to working with the community on improving the r image.

I set up some goals for myself in the near future (in order):

  • unblock the build and get a new release going with the current r version (2 weeks from now).
  • upgrade to newer r version, aiming for 4.4 (no timeline), not 100% up to date on what is blocking this so i need to review this.

I am most definitely open to receiving/reviewing PR from you.

@calderjo
Copy link
Contributor

calderjo commented Jun 6, 2024

hey, we might be a step closer to that r4.4.0 image, then i expected

#79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants