Skip to content

difuture-lmu/dsROCGLM

Repository files navigation

ROC-GLM for DataSHIELD

THE PACKAGE HAS MOVED TO THE dsBinVal REPOSITORY!

The package provides functionality to conduct and visualize ROC analysis on decentralized data. The basis is the DataSHIELD](https://www.datashield.org/) infrastructure for distributed computing. This package provides the calculation of the ROC-GLM as well as AUC confidence intervals. In order to calculate the ROC-GLM it is necessry to push models and predict them at the servers. This is done automatically by the base package dsPredictBase. Note that DataSHIELD uses an option datashield.privacyLevel to indicate the minimal amount of numbers required to be allowed to share an aggregated value of these numbers. Instead of setting the option, we directly retrieve the privacy level from the DESCRIPTION file each time a function calls for it. This options is set to 5 by default.

Installation

At the moment, there is no CRAN version available. Install the development version from GitHub:

remotes::install_github("difuture-lmu/dsROCGLM")

Register methods

It is necessary to register the assign and aggregate methods in the OPAL administration. These methods are registered automatically when publishing the package on OPAL (see DESCRIPTION).

Note that the package needs to be installed at both locations, the server and the analysts machine.

Usage

A more sophisticated example is available here.

library(DSI)
#> Loading required package: progress
#> Loading required package: R6
library(DSOpal)
#> Loading required package: opalr
#> Loading required package: httr
library(dsBaseClient)

library(dsPredictBase)
library(dsROCGLM)

Log into DataSHIELD server

builder = newDSLoginBuilder()

surl     = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"

builder$append(
  server   = "ds1",
  url      = surl,
  user     = username,
  password = password
)
builder$append(
  server   = "ds2",
  url      = surl,
  user     = username,
  password = password
)

connections = datashield.login(logins = builder$build(), assign = TRUE)
#>
#> Logging into the collaborating servers

Assign iris and validation vector at DataSHIELD (just for testing)

datashield.assign(connections, "iris", quote(iris))
datashield.assign(connections, "y", quote(c(rep(1, 50), rep(0, 100))))

Load test model, push to DataSHIELD, and calculate predictions

# Model predicts if species of iris is setosa or not.
iris$y = ifelse(iris$Species == "setosa", 1, 0)
mod = glm(y ~ Sepal.Length, data = iris, family = binomial())

# Push the model to the DataSHIELD servers using `dsPredictBase`:
pushObject(connections, mod)

# Calculate scores and save at the servers using `dsPredictBase`:
pfun =  "predict(mod, newdata = D, type = 'response')"
predictModel(connections, mod, "pred", "iris", predict_fun = pfun)

datashield.symbols(connections)
#> $ds1
#> [1] "iris" "mod"  "pred" "y"
#>
#> $ds2
#> [1] "iris" "mod"  "pred" "y"

Calculate l2-sensitivity

# In order to securely calculate the ROC-GLM, we have to assess the
# l2-sensitivity to set the privacy parameters of differential
# privacy adequately:
l2s = dsL2Sens(connections, "iris", "pred")
l2s
#> [1] 0.1280699

# Due to the results presented in https://arxiv.org/abs/2203.10828, we set the privacy parameters to
# - epsilon = 0.2, delta = 0.1 if i      l2s <= 0.01
# - epsilon = 0.3, delta = 0.4 if 0.01 < l2s <= 0.03
# - epsilon = 0.5, delta = 0.3 if 0.03 < l2s <= 0.05
# - epsilon = 0.5, delta = 0.5 if 0.05 < l2s <= 0.07
# - epsilon = 0.5, delta = 0.5 if 0.07 < l2s BUT results may be not good!

Calculate ROC-GLM

roc_glm = dsROCGLM(connections, truth_name = "y", pred_name = "pred",
  dat_name = "iris", seed_object = "y")
#>
#> [2022-04-04 12:47:46] L2 sensitivity is: 0.1281
#> Warning in dsROCGLM(connections, truth_name = "y", pred_name = "pred", dat_name
#> = "iris", : l2-sensitivity may be too high for good results! Epsilon = 0.5 and
#> delta = 0.5 is used which may lead to bad results.
#>
#> [2022-04-04 12:47:47] Setting: epsilon = 0.5 and delta = 0.5
#>
#> [2022-04-04 12:47:47] Initializing ROC-GLM
#>
#> [2022-04-04 12:47:47] Host: Received scores of negative response
#> [2022-04-04 12:47:47] Receiving negative scores
#> [2022-04-04 12:47:49] Host: Pushing pooled scores
#> [2022-04-04 12:47:50] Server: Calculating placement values and parts for ROC-GLM
#> [2022-04-04 12:47:52] Server: Calculating probit regression to obtain ROC-GLM
#> [2022-04-04 12:47:53] Deviance of iter1=137.2431
#> [2022-04-04 12:47:54] Deviance of iter2=121.5994
#> [2022-04-04 12:47:56] Deviance of iter3=147.7237
#> [2022-04-04 12:47:57] Deviance of iter4=140.4008
#> [2022-04-04 12:47:58] Deviance of iter5=129.2244
#> [2022-04-04 12:48:00] Deviance of iter6=123.9979
#> [2022-04-04 12:48:01] Deviance of iter7=123.1971
#> [2022-04-04 12:48:02] Deviance of iter8=124.1615
#> [2022-04-04 12:48:04] Deviance of iter9=124.5356
#> [2022-04-04 12:48:05] Deviance of iter10=124.5503
#> [2022-04-04 12:48:06] Deviance of iter11=124.5504
#> [2022-04-04 12:48:08] Deviance of iter12=124.5504
#> [2022-04-04 12:48:08] Host: Finished calculating ROC-GLM
#> [2022-04-04 12:48:08] Host: Cleaning data on server
#> [2022-04-04 12:48:09] Host: Calculating AUC and CI
#> [2022-04-04 12:48:18] Finished!
roc_glm
#>
#> ROC-GLM after Pepe:
#>
#>  Binormal form: pnorm(2.51 + 1.55*qnorm(t))
#>
#>  AUC and 0.95 CI: [0.86----0.91----0.95]

plot(roc_glm)

Deploy information:

Build by root (Darwin) on 2022-04-04 12:48:23.

This readme is built automatically after each push to the repository. Hence, it also is a test if the functionality of the package works also on the DataSHIELD servers. We also test these functionality in tests/testthat/test_on_active_server.R. The system information of the local and remote servers are as followed:

  • Local machine:
    • R version: R version 4.1.3 (2022-03-10)
    • Version of DataSHELD client packages:
Package Version
DSI 1.3.0
DSOpal 1.3.1
dsBaseClient 6.1.1
dsPredictBase 0.0.1
dsROCGLM 1.0.0
  • Remote DataSHIELD machines:
    • R version of ds1: R version 4.1.1 (2021-08-10)
    • R version of ds2: R version 4.1.1 (2021-08-10)
    • Version of server packages:
Package ds1: Version ds2: Version
dsBase 6.1.1 6.1.1
resourcer 1.1.1 1.1.1
dsPredictBase 0.0.1 0.0.1
dsROCGLM 1.0.0 1.0.0