Regularized Network-Based Variable Selection
Network-based regularization has achieved success in variable selection for high-dimensional biological data due to its ability to incorporate correlations among genomic features. This package provides procedures of network-based variable selection for generalized linear models (Ren et al.(2017) and Ren et al.(2019)). Continuous, binary, and survival response are supported. Robust network-based methods are available for continuous and survival responses.
- To install the devel version from github, run these two lines of code in R
install.packages("devtools")
devtools::install_github("jrhub/regnet")
- Released versions of regnet are available on CRAN (link), and can be installed within R via
install.packages("regnet")
data(SurvExample)
X = rgn.surv$X
Y = rgn.surv$Y
clv = c(1:5) # variable 1 to 5 are clinical variables, we choose not to penalize them here.
out = cv.regnet(X, Y, response="survival", penalty="network", clv=clv, robust=TRUE, verbo = TRUE)
out$lambda
fit = regnet(X, Y, "survival", "network", out$lambda[1,1], out$lambda[1,2], clv=clv, robust=TRUE)
index = which(rgn.surv$beta[-(1:6)] != 0) # [-(1:6)] removes the intercept and clinical variables that are not subject to selection.
pos = which(fit$coeff[-(1:6)] != 0)
tp = length(intersect(index, pos))
fp = length(pos) - tp
list(tp=tp, fp=fp)
data(LogisticExample)
X = rgn.logi$X
Y = rgn.logi$Y
out = cv.regnet(X, Y, response="binary", penalty="network", folds=5, r = 4.5, robust=FALSE)
out$lambda
fit = regnet(X, Y, "binary", "network", out$lambda[1,1], out$lambda[1,2], r = 4.5)
index = which(rgn.logi$beta[-1] != 0) # [-1] removes the intercept
pos = which(fit$coeff[-1] != 0)
tp = length(intersect(index, pos))
fp = length(pos) - tp
list(tp=tp, fp=fp)
data(ContExample)
X = rgn.tcga$X
Y = rgn.tcga$Y
clv = (1:2)
fit = regnet(X, Y, "continuous", "network", rgn.tcga$lamb1, rgn.tcga$lamb2, clv =clv, alpha.i=0.5, robust=FALSE)
net = plot(fit)
subs = plot(fit, subnetworks = TRUE, vsize=20, labelDist = 3, theta = 5)
- Added the robust network regularization for the continuous response.
- A generic function plot() is added for plotting the network structures among the identified genetic variants.
- multiple-cores computation is removed for CRAN submission.
- cv.regnet() now can run on multiple cores via the support of OpenMP library.
- A generic function plot() is added for plotting the network structures among the identified genetic variants.
Based on users’ feedback, we have
- Added more checking steps for data format, which help users make sure their data are in the correct format.
- Provided more information in the documentation for troubleshooting.
- Two new, easy to use, integrated interfaces: cv.regnet() and regnet().
- New methods for continuous and survival responses.
- The new “clv” argument allows the presence of clinical variables that are not subject to penalty in the X matrix.
- Provides c++ implementation for coordinate descent algorithms. This update significantly increases the speed of cross-validation functions in this package.
This package provides implementation for methods proposed in
-
Ren, J., He, T., Li, Y., Liu, S., Du, Y., Jiang, Y., Wu, C. (2017). Network-based regularization for high dimensional SNP data in the case-control study of Type 2 diabetes. BMC Genetics, 18(1):44
-
Ren, J., Du, Y., Li, S., Ma, S., Jiang,Y. and Wu, C. (2019). Robust network-based regularization and variable selection for high dimensional genomics data in cancer prognosis. Genet. Epidemiol. 43:276-291
-
Wu, C., and Ma, S. (2015). A selective review of robust variable selection with applications in bioinformatics. Briefings in Bioinformatics, 16(5), 873–883
-
Wu, C., Shi, X., Cui, Y. and Ma, S. (2015). A penalized robust semiparametric approach for gene-environment interactions. Statistics in Medicine, 34 (30): 4016–4030
-
Wu, C, Jiang, Y, Ren, J, Cui, Y, Ma, S. (2018). Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures.Statistics in Medicine, 37:437–456