-
Notifications
You must be signed in to change notification settings - Fork 31
Usage
The Reference Manual contains a detailed description of all the functions available in L0Learn. In what follows, we give a brief tour of the main functions and settings.
The main function in L0Learn is the fit
function which has the following interface and default values:
L0Learn.fit(X,y, Loss="SquaredError", Penalty="L0", Algorithm="CD", MaxSuppSize=100, NLambda=100)
- X is the data matrix and y is the response vector
- Loss specifies the loss function. Currently, we only officially support "SquaredError". We will be adding support for other loss functions soon.
-
Penalty: The following three penalties are possible "L0", "L0L2", and "L0L1". Note: When using "L0L2" and "L0L1", the number of grid points for the parameter gamma should be specified using the parameter
Ngamma
. Moreover, for "L0L2", the maximum and minimum values of gamma should be specified usingGammaMax
andGammaMin
, respectively. -
Algorithm: The choice of the optimization algorithm can have significant effect on the quality of the solutions. Currently, we support the following two algorithms:
- CD: A Coordinate Descent-type algorithm with all the tweaks and heuristics discussed in our paper.
- CDPSI: This algorithm combines Coordinate Descent and Local Combinatorial Search to escape weak local minima. It typically leads to higher-quality solutions compared to CD, at the cost of additional running time. This is the CD-PSI(1) algorithm introduced in our paper.
- MaxSuppSize specifies the maximum support size in the regularization path after which the algorithm terminates. The toolkit's internals optimize the running time based on this parameter (this choice can affect the type of optimization algorithm used). We recommend experimenting with small values first (e.g., 5% of p) as L0-regularization typically selects a small portion of the features.
- NLambda is the number of Lambda grid points. Note: The actual values of Lambda are data-dependent and are computed automatically by the algorithm.
To demonstrate how L0Learn works we will generate the following synthetic dataset
- A 500x1000 design matrix X with iid standard normal entries
- A 1000x1 vector B with the first 10 entries set to 1 and the rest are zeros.
- A 500x1 vector e with iid standard normal entries
- Set y = XB + e
set.seed(1) # fix the seed to get a reproducible result
X = matrix(rnorm(500*1000),nrow=500,ncol=1000)
B = c(rep(1,10),rep(0,990))
e = rnorm(500)
y = X%*%B + e
We will use L0Learn to estimate B from the data (y,X). First we load L0Learn:
library(L0Learn)
To fit a path of solutions for the L0-regularized model with maximal support size of 50 using Algorithm CD we use the command:
fit = L0Learn.fit(X, y, Loss="SquaredError", Penalty="L0", Algorithm="CD", MaxSuppSize=50)
This will generate solutions for a sequence of Lambda values (chosen automatically by the algorithm). To view the sequence of Lambda values along with the associated support sizes, we use:
print(fit)
and we get the following output:
lambda suppsize
1 0.068285500 1
2 0.055200200 2
3 0.049032300 3
4 0.040072500 6
5 0.038602800 7
6 0.037265300 8
7 0.032514200 10
8 0.001142920 11
9 0.000821221 13
10 0.000702287 14
11 0.000669519 15
12 0.000489943 17
13 0.000412565 22
14 0.000404252 24
15 0.000369975 27
16 0.000357211 31
17 0.000331164 40
18 0.000284271 42
19 0.000240881 50
The sequence of lambda values can is a member of the fit object and can be accessed using fit$lambda
. For example, the lambda corresponding to the first solution in the path can be retrieved using fit$lambda[1]
.
To print the estimated B for a particular value of lambda, we use the function coef(fit,lambda)
which takes the object fit as the first parameter and the value of lambda
corresponding to the solution as the second parameter. Note that (in this example) the solution at index 7 has a support size of 10. We can retrieve this solution using the coef
function as follows:
coef(fit,lambda=fit$lambda[7])
to get the following output:
1001 x 1 sparse Matrix of class "dgCMatrix"
Intercept 0.01052402
V1 1.01601044
V2 1.01830944
V3 1.00606875
V4 0.98309180
V5 0.97389883
V6 0.96148076
V7 1.00990714
V8 1.08535507
V9 1.02686930
V10 0.94235619
V11 .
V12 .
V13 .
V14 .
V15 .
V16 .
V17 .
V18 .
V19 .
V20 .
.
.
.
The output is a sparse vector of type dgCMatrix
. The first element in the vector is intercept and the rest are the B coefficients. Aside from the intercept, the only non-zeros in the above solution are coordinates V1, V2, V3, ..., V10, which are the non-zero coordinates in the true support (used to generated the data). Thus, this solution successfully recovers the true support. We can also make predictions using a specific solution in the grid using the function predict(fit,newx,lambda)
where newx is a testing sample (vector or matrix). For example, to predict the response for the samples in the data matrix X using the solution at index 7 we call the prediction function
predict(fit,X,lambda=fit$lambda[7])
We have demonstrated the simple case of using an L0 penalty alone. We can also run L0Learn using the L0L2 and L0L1 penalties, which typically leads to better predictive models due to the additional shrinkage. For example, to fit a model using the L0L2 penalty for a two-dimensional grid of lambda and gamma values, we call L0Learn.fit
with Penalty="L0L2"
as follows:
fit = L0Learn.fit(X, y, Loss="SquaredError", Penalty="L0L2", NGamma = 10, GammaMin = 0.0001, GammaMax = 10, Algorithm="CD", MaxSuppSize=50)
Note that in the above call, we set the number of gamma
points in the grid to 10 using the NGamma
parameter. We also specified the values for maximum and minimum values of gamma
using GammaMax
and GammaMin
. L0Learn will generate a grid of 10 gamma values equi-spaced on the logarithmic scale between GammaMax
and GammaMin
. Similar to the case of L0, we can print a summary of the regularization path using print(fit)
, which leads to the following output:
lambda gamma suppsize
1 0.003251690 1.000000e+01 1
2 0.002776000 1.000000e+01 2
3 0.002687010 1.000000e+01 3
4 0.002577800 1.000000e+01 3
5 0.002062240 1.000000e+01 6
6 0.001861720 1.000000e+01 7
7 0.001710400 1.000000e+01 8
8 0.001644570 1.000000e+01 9
9 0.001153900 1.000000e+01 10
10 0.000482462 1.000000e+01 10
11 0.000385970 1.000000e+01 12
12 0.000374391 1.000000e+01 14
13 0.000363159 1.000000e+01 15
14 0.000352264 1.000000e+01 15
15 0.000281811 1.000000e+01 19
16 0.000273357 1.000000e+01 19
17 0.000218686 1.000000e+01 29
18 0.000212125 1.000000e+01 31
19 0.000205761 1.000000e+01 35
20 0.000199589 1.000000e+01 38
21 0.000193601 1.000000e+01 38
22 0.000154881 1.000000e+01 66
23 0.010401300 2.782559e+00 1
24 0.008879650 2.782559e+00 2
25 0.008594990 2.782559e+00 2
26 0.006875990 2.782559e+00 6
27 0.005955120 2.782559e+00 7
.
.
.
The sequence of gamma values is a member of the fit object, and it can extracted using fit$gamma
. Moreover, in this case, the ith item in the list fit$lambda
is the sequence of lambda values generated for at fit$gamma[i]
. We can extract the solution(s) at a specific sequence of lambda(s) and gamma using the coef(fit,lambda,gamma)
function. For example, the solution with index 9 in the above output can be extracted using coef(fit,lambda=0.0011539, gamma=10)
or equivalently using coef(fit,lambda=fit$lambda[1], gamma=fit$gamma[1])
. Similarly, predictions at a specific solution can be made using predict(fit, newx, lambda, gamma)
. We note that the previous discussion on L0L2 also applies to L0L1 by just changing the Penalty parameter in L0Learn.fit
.