Merge pull request #472 from oooo26/master

Update document
abess-team · Jan 13, 2023 · 266a7e9 · 266a7e9
2 parents c057806 + 8a964e8
commit 266a7e9
Show file tree

Hide file tree

Showing 18 changed files with 58 additions and 72 deletions.
diff --git a/docs/Changelog.rst b/docs/Changelog.rst
@@ -1,22 +1,25 @@
 Changelog
 =========
 
-Unreleased
-----------
+Version 0.4.6
+-------------
 
 -  R package
 -  Python package
 
+   -  Support `score` function for all GLM estimators.
    -  Rearrange some arguments to improve legibility. 
       Please check `here <https://abess.readthedocs.io/en/latest/Python-package/index.html>`__ for the latest API.
+   -  Better docstring, e.g. move important arguments to the front.
+   -  Combine `metrics.py` and `functions.py`.
 
 -  C++
 
-   -  Support the base model for GLM. The Sparse GLM model can be implemented easilier.
+   -  Support the base model for GLM. The Sparse GLM model can be implemented much easilier.
    -  Re-write logistic, poisson and gamma regression on the basis of GLM base model.
 
 Versions 0.4.2 -- 0.4.5
-----------
+-----------------------
 
 -  R package
 
@@ -52,7 +55,7 @@ Versions 0.4.2 -- 0.4.5
       `Junhao Huang <https://github.com/oooo26>`__!
 
 Version 0.4.1
-----------
+-------------
 
 -  R package
 

diff --git a/docs/Contributing/AfterCodeDeveloping.rst b/docs/Contributing/AfterCodeDeveloping.rst
@@ -1,5 +1,5 @@
 After Code Developing
-===========
+=====================
 
 CodeFactor
 ----------

diff --git a/docs/Contributing/AppendixArchitecture.rst b/docs/Contributing/AppendixArchitecture.rst
@@ -1,5 +1,5 @@
 Appendix: Architecture of **abess**
-========================
+===================================
 
 In this page, we briefly introduce our core code of ``abess``, which is summarized in the Figure below. 
 

diff --git a/docs/Contributing/Bug-NewFeatures.rst b/docs/Contributing/Bug-NewFeatures.rst
@@ -1,5 +1,5 @@
 Bug Report or New Feature Request
-============
+=================================
 
 Bugs Report
 -----------
@@ -27,4 +27,4 @@ When suggesting a new feature, please:
 -  explain in detail how it would work.
 -  keep the scope as narrow as possible, to make it easier to understand
    and implementation.
--  provide few important literatures if possible.
+-  provide few important literatures if possible.
diff --git a/docs/Contributing/ContributeDocs.rst b/docs/Contributing/ContributeDocs.rst
@@ -1,10 +1,10 @@
 Contribute documentation
-============
+========================
 
 .. _general development procedure:
 
 General development procedure
-~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 If you're a more experienced with the ``abess`` and are looking forward to
 improve your open source development skills, the next step up is to
@@ -77,14 +77,14 @@ parameters it requires, such as
 Also note that the style of Python document is similar to
 `numpydoc <https://numpydoc.readthedocs.io/en/latest/format.html>`__.
 
-The development of Python API's documentation relies on
+The development of Python API's documentation mainly relies on
 `Sphinx <https://pypi.org/project/Sphinx/>`__,
-`nbsphinx <https://pypi.org/project/nbsphinx/>`__ (support jupyter
-notebook for Sphinx),
-`myst-parser <https://pypi.org/project/myst-parser/>`__ (support
+`sphinx-gallery <https://pypi.org/project/sphinx-gallery/>`__ (support
 markdown for Sphinx),
 `sphinx-rtd-theme <https://pypi.org/project/sphinx-rtd-theme/>`__
-(support “Read the Docs” theme for Sphinx). Make sure these packages
+(support “Read the Docs” theme for Sphinx) and so on.
+
+Please make sure all packages in :code:`docs/requirements.txt` 
 have been installed.
 
 Tutorials
@@ -107,7 +107,7 @@ The development of the tutorial relies on `sphinix-gallery <https://pypi.org/pro
 .. _python document development:
 
 Document development
-^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^
 
 Before developing document, we presume that you have already complete the steps 1-3 described in `general development procedure`_, 
 and you have installed necessary packages, including: ``sphinix-gallery``, ``Sphinx``, ``nbsphinx``, ``myst-parser``, ``sphinx-rtd-theme``.

diff --git a/docs/Contributing/ContributePyR.rst b/docs/Contributing/ContributePyR.rst
@@ -1,5 +1,5 @@
 Contribute Python/R code
-============
+========================
 
 If you are a experienced programmer, you might want to help new features
 development or bug fixing for the abess library. The preferred workflow

diff --git a/docs/Contributing/DevelopNewFeatures.rst b/docs/Contributing/DevelopNewFeatures.rst
@@ -1,5 +1,5 @@
 Develop New Features
-===============
+====================
 
 In this tutorial, we will show you how to develop a new algorithm for specific best-subset problem with ``abess``'s procedure. 
 

diff --git a/docs/Contributing/index.rst b/docs/Contributing/index.rst
@@ -1,6 +1,6 @@
-######################
+############
 Contributing
-######################
+############
 
 Contributions are welcome! No matter your current skills, it's possible
 to make valuable contribution to the ``abess``.

diff --git a/docs/Tutorial/1-glm/plot_1_LinearRegression.py b/docs/Tutorial/1-glm/plot_1_LinearRegression.py
@@ -21,10 +21,10 @@
 # Next, we present an example to show the ``abess`` package can get an optimal estimation.
 #
 # Toward optimality: adaptive best-subset selection
-# ^^^^^^^^^^^^^^^^^^^^^^
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 # 
 # Synthetic dataset
-# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# ~~~~~~~~~~~~~~~~~
 #
 # We generate a design matrix :math:`X` containing :math:`n = 300` observations and each observation has :math:`p = 1000` predictors.
 # The response variable :math:`y` is linearly related to the first, second, and fifth predictors in :math:`X`:

diff --git a/docs/Tutorial/1-glm/plot_4_CoxRegression.py b/docs/Tutorial/1-glm/plot_4_CoxRegression.py
@@ -1,7 +1,7 @@
 """
-==============
+=================================
 Survival Analysis: Cox Regression
-==============
+=================================
 """
 ###############################################################################
 # Cox Proportional Hazards Regression
@@ -49,7 +49,7 @@
 # which is independent of time.
 #
 # Lung Cancer Dataset Analysis
-# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 # We are going to apply best subset selection to the NCCTG Lung Cancer Dataset from https://www.kaggle.com/ukveteran/ncctg-lung-cancer-data.
 # This dataset consists of survival information of patients with advanced lung cancer from the North Central Cancer Treatment Group.
 # The proportional hazards model allows the analysis of survival data by regression modeling.
@@ -114,7 +114,6 @@
 # After fitting, the coefficients are stored in ``model.coef_``,
 # and the non-zero values indicate the variables used in our model.
 
-
 print(model.coef_)
 
 # %%
@@ -175,10 +174,9 @@
 # the sample with the higher risk prediction will experience an event
 # before the other sample or belong to a higher binary class.
 
-from abess.metrics import concordance_index_censored
-
-cindex = concordance_index_censored(test[:, 1] == 2, test[:, 0], pred)
-print(cindex[0])
+test[:, 1] = test[:, 1] == 2
+cindex = model.score(test[:, 2:], test[:, :2])
+print(cindex)
 
 # %%
 # On this dataset, the C-index is about 0.68.

diff --git a/docs/Tutorial/1-glm/plot_a1_power_of_abess.py b/docs/Tutorial/1-glm/plot_a1_power_of_abess.py
@@ -24,7 +24,7 @@
 
 ######################################
 # Simulation Setting
-# ^^^^^^^^^^
+# ^^^^^^^^^^^^^^^^^^
 #
 # Both packages are compared in three aspects including the prediction
 # performance, the variable selection performance, and the computation
@@ -66,7 +66,7 @@
 
 ##############################################################
 # Numerical Results
-# ^^^^^^^^^^^^^^^^^^^^
+# ^^^^^^^^^^^^^^^^^
 #
 # For linear regression, we compare three methods in the two packages:
 # Lasso, OMP and abess. For logistic regression, we compare two

diff --git a/docs/Tutorial/5-scikit-learn-connection/causal_model.png b/docs/Tutorial/5-scikit-learn-connection/causal_model.png
diff --git a/docs/Tutorial/5-scikit-learn-connection/plot_1_scikit_learn.py b/docs/Tutorial/5-scikit-learn-connection/plot_1_scikit_learn.py
@@ -99,8 +99,4 @@
 plt.show()
 
 # %%
-# 
-# 
-# 
 # sphinx_gallery_thumbnail_path = 'Tutorial/figure/scikit_learn.png'
-# 
diff --git a/docs/Tutorial/5-scikit-learn-connection/plot_2_geomstats.py b/docs/Tutorial/5-scikit-learn-connection/plot_2_geomstats.py
@@ -1,11 +1,11 @@
 """
 Work with geomstats
-======================
+===================
 """
 
 # %%
 # The package `geomstats` is used for computations and statistics on nonlinear manifolds, 
-#such as Hypersphere,Hyperbolic Space, Symmetric-Positive-Definite (SPD) Matrices Space and Skew-Symmetric Matrices Space. 
+# such as Hypersphere,Hyperbolic Space, Symmetric-Positive-Definite (SPD) Matrices Space and Skew-Symmetric Matrices Space. 
 # `abess` also works well with the package `geomstats`. 
 # Here is an example of using `abess` to do logistic regression of samples on Hypersphere, 
 # and we will compare the precision score, the recall score and the running time with `abess` and with `scikit-learn`.
@@ -27,7 +27,7 @@
 
 ###############################################################################
 # An Example
-# ---------------------
+# ----------
 # Two sets of samples on Hypersphere in 3-dimensional Euclidean Space are created. 
 # The sample points in `data0` are distributed around $[-3/5, 0, 4/5]$, and the sample points in `data1` are distributed around $[3/5, 0, 4/5]$. 
 # The sample size of both is set to 100, and the precision of both is set to 5. 
@@ -107,7 +107,7 @@
 
 ###############################################################################
 # Comparison
-# -------------
+# ----------
 # Here is the comparison of the precision score and the recall score with `abess` and `scikit-learn`, and 
 # the comparison of the running time with `abess` and `scikit-learn`. 
 # 
@@ -212,8 +212,4 @@
 # And the running time with `abess` is only slightly slower than that without `abess`.
 
 # %%
-# 
-# 
-# 
 # sphinx_gallery_thumbnail_path = 'Tutorial/figure/geomstats.png'
-# 
diff --git a/docs/Tutorial/5-scikit-learn-connection/plot_3_double_machine_learning.py b/docs/Tutorial/5-scikit-learn-connection/plot_3_double_machine_learning.py
@@ -2,17 +2,17 @@
 ================================
 Work with DoubleML
 ================================
-Double machine learning offer a debiased way for estimating low-dimensional parameter of interest in the presence of
+Double machine learning [1]_ offer a debiased way for estimating low-dimensional parameter of interest in the presence of
 high-dimensional nuisance. Many machine learning methods can be used to estimate the nuisance parameters, such as random
-forests, lasso or post-lasso, neural nets, boosted regression trees, and so on. The Python package ``DoubleML`` provide an
+forests, lasso or post-lasso, neural nets, boosted regression trees, and so on. The Python package ``DoubleML`` [2]_ provide an
 implementation of the double machine learning. It's built on top of scikit-learn and is an excellent package. The
 object-oriented implementation of ``DoubleML`` is very flexible, in particular functionalities to estimate double machine
 learning models and to perform statistical inference via the methods fit, bootstrap, confint, p_adjust and tune.
 """
 
 ###############################################################################
 #
-# In fact, ``abess`` also works well with the package ``DoubleML``. Here is an example of using ``abess`` to solve such
+# In fact, ``abess`` [3]_ also works well with the package ``DoubleML``. Here is an example of using ``abess`` to solve such
 # a problem, and we will compare it to the lasso regression.
 
 
@@ -28,7 +28,7 @@
 
 ###############################################################################
 # Partially linear regression (PLR) model
-# ^^^^^^^^^^^^^^^^^
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 # PLR models take the form
 #
 # .. math::
@@ -47,7 +47,7 @@
 
 ###############################################################################
 # Data
-# """""""""""""""
+# """"
 # We simulate the data from a PLR model, which both :math:`m_0` and :math:`g_0` are low-dimensional linear combinations
 # of :math:`X`, and we save the data as ``DoubleMLData`` class.
 
@@ -67,7 +67,7 @@
 
 ###############################################################################
 # Model fitting with ``abess``
-# """""""""""""""
+# """"""""""""""""""""""""""""
 # Based on the simulated data, now we are going to illustrate how to integrate the ``abess`` with ``DoubleML``. To
 # estimate the PLR model with the double machine learning algorithm, first we need to choose a learner to estimate the
 # nuisance parameters :math:`\eta_0 = (m_0, g_0)`. Considering the sparsity of the data, we can use the adaptive best
@@ -90,7 +90,7 @@
 
 ###############################################################################
 # Comparison with lasso
-# ^^^^^^^^^^^^^^^^^
+# ^^^^^^^^^^^^^^^^^^^^^
 # The lasso regression is a shrinkage and variable selection method for regression models, which can also be used in
 # high-dimensional setting. Here, we compare the abess regression with the lasso regression at different variable
 # dimensions.
@@ -190,8 +190,4 @@
 #
 
 # %%
-#
-#
-#
 # sphinx_gallery_thumbnail_path = 'Tutorial/figure/doubleml.png'
-#
diff --git a/docs/Tutorial/5-scikit-learn-connection/plot_4_pyts.py b/docs/Tutorial/5-scikit-learn-connection/plot_4_pyts.py
@@ -1,7 +1,7 @@
 """
-================================
+==============
 Work with pyts
-================================
+==============
 ``pyts`` is a Python package dedicated to time series classification. It aims to make time series classification 
 easily accessible by providing preprocessing and utility tools, and implementations of several time series 
 classification algorithms. In this example, we will mainly focus on the shapelets-based algorithms.
@@ -29,7 +29,7 @@
 
 # %%
 # Data
-# """""""""""""""
+# """"
 # In this example, we use the buint-in coffee dataset in ``pyts`` to perform shapelets learning. It has two classes, 
 # 0 and 1. So, this is a binary classification task. Both train dataset and test dataset have 28 time series and the 
 # dimension of each time series is 286. We plot the time series in the train dataset.
@@ -53,7 +53,7 @@
 
 # %%
 # Learning shapelets with ``abess``
-# """""""""""""""""""""""""""""""""""
+# """""""""""""""""""""""""""""""""
 # To select discriminant shapelets, we first collect all subsequences with predefined length and step as the candidates. 
 # Then we transform the original time series by computing the distance between them to each subsequence. Therefore, 
 # the original time series are transformed to some ultra high dimensional vectors. Finally, we perform binary 
@@ -132,7 +132,7 @@ def fit_predict(self, size=None):
 
 # %%
 # Learning shapelets with ``pyts``
-# """"""""""""""""""""""""""""""""""
+# """"""""""""""""""""""""""""""""
 # We compare our method with the one implemented in ``pyts``, which is a two-step procedure. First, it selects discriminant
 # shapelets based on mutual information. Then, a support vector machine is applied to perform binary classification with 
 # transformed time series based on those selected shapelets. Analogously, we print the performance and execution time.
@@ -154,7 +154,7 @@ def fit_predict(self, size=None):
 
 # %%
 # Plot: learned shapelets
-# """""""""""""""""""""""""
+# """""""""""""""""""""""
 # The following figure shows the discriminant shapelets selected by these two methods.
 
 # %%
-Original file line number
+Diff line change
@@ Expand Up / @@ -99,8 +99,4 @@ @@
     plt.show()
     # %%
-    #
-    #
-    #
     # sphinx_gallery_thumbnail_path = 'Tutorial/figure/scikit_learn.png'
-    #