Skip to content

RGMQL: Scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor

Notifications You must be signed in to change notification settings

DEIB-GECO/RGMQL

Repository files navigation

RGMQL

API for calling interactively the GMQL Engine from R/Bioconductor

RGMQL on Bioconductor

About

RGMQL is a R/Bioconductor package conceived to provide a set of specialized functions to extract, combine, process and compare omics datasets and their metadata from different and differently localized sources. RGMQL is built over the GenoMetric Query Language (GMQL) data management and computational engine, and can leverage its open curated repository as well as its cloud-based resources, with the possibility of outsourcing computational tasks to GMQL remote services. Furthermore, it overcomes the limits of the GMQL declarative syntax, by guaranteeing a procedural approach in dealing with omics data within the R/Bioconductor environment. But mostly, it provides full interoperability with other packages of the R/Bioconductor framework and extensibility over the most used genomic data structures and processing functions.

Requirements

The library requires the following:

  • R version 3.4.2 or higher
  • Java version 1.8 or higher
  • The JAVA_HOME enviroment variable set

It is recommended the use ot the latest version of RStudio.

Structure

RGMQL/
|-- Example of workflows/
|   |-- EXAMPLES.Rproj
|   |-- use_case_1.Rmd
|   |-- use_case_2.Rmd
|   |-- use_case_3.Rmd
|   |-- use_case_1.html
|   |-- use_case_2.html
|   |-- use_case_3.html
|   |-- ....
|-- R/
|-- inst/
|   |-- example/
|   |-- NEWS
|-- man/
|-- vignettes/
|   |-- RGMQL-vignette.R
|   |-- RGMQL-vignette.Rmd
|   |-- RGMQL-vignette.html
|   |-- american-medical-association-no-et-al.csl
|   |-- bibliography.bib
|   |-- ....
|-- DESCRIPTION
|-- NAMESPACE
|-- README.md

OSX Settings

before Catalina

Edit the .bash_profile and add the JAVA_HOME environment variable:

export JAVA_HOME = <java_path>

export PATH=$PATH

after Catalina

Since on macOS Catalina the default shell is Zsh we need to edit or create the .zsh file:

Edit the .zsh and add the JAVA_HOME environment variable:

export JAVA_HOME = <java_path>

export PATH=$PATH

At the end, in both cases, edit the /etc/paths and add:

$JAVA_HOME/bin

Errors

Be aware that using a too recent Java version (e.g., the Java version 17) on macOS Mojave currently gives errors in running RGMQL. Conversely, macOS Big Sur runs smoothly RGMQL also with the last Java version 17.

To overcome this issue, we suggest macOS Mojave users to configure rJava to use an older version, like the Java version 11, by running the following command:

sudo R CMD javareconf JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.0.13.jdk/Contents/Home

Also, check that Java in use is actually version 11 and, then, check the rJava version used in R, by running:

library(rJava)

.jinit()

.jcall("java/lang/System", "S", "getProperty", "java.runtime.version")

Windows Settings

Create environment variable JAVA_HOME:

  • Right click on This PC.
  • click on Advanced system settings
  • go to Advanced tab an click on evnironment variables
  • create a JAVA_HOME variable the jdk path

Errors

Be aware that during a local-processing execution the following error message may arise:

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, 
: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 7.0 failed 1 times, most recent failure: 
Lost task 0.0 in stage 7.0 (TID 59, localhost, executor driver): 
java.io.IOException: (null) entry in command string: null chmod 0644

This happens because some Hadoop binary files are missing in Windows 64 bits. In this case you need to:

  • Open DownGit
  • Paste the url https://github.com/steveloughran/winutils/tree/master/hadoop-2.8.1 and download the winutil-hadoop2.8.1
  • Create a folder (for example at C:\Program Files\hadoop\bin), using a path you wish
  • Copy the files from the repository folder hadoop-2.8.1 into the folder earlier created
  • Create the environment variable HADOOP_HOME with value equal to the folder path where you copied the binaries.

or

  • Go to https://github.com/steveloughran/winutils, download the repository
  • Create a folder (for example at C:\Program Files\hadoop\bin), using a path you wish
  • Copy the files from the repository folder hadoop-2.8.1 into the folder earlier created
  • Create the environment variable HADOOP_HOME with value equal to the folder path where you copied the binaries.

About

RGMQL: Scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published