Skip to content

Commit

Permalink
Doc: Added 3 examples on regression analysis
Browse files Browse the repository at this point in the history
Create a new Linthurst use case for regression
  • Loading branch information
mbaudin47 authored and jschueller committed Jun 19, 2023
1 parent 97c1757 commit 7179737
Show file tree
Hide file tree
Showing 10 changed files with 1,340 additions and 69 deletions.
2 changes: 2 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
* UniformOverMesh (openturns.experimental)
* GeneralizedExtremeValueValidation (openturns.experimental)
* Coles (openturns.usecases.coles)
* Linthurst (openturns.usescases.linthurst)

==== API changes ====
* Deprecated LinearLeastSquaresCalibration::getCandidate, use getStartingPoint
Expand Down Expand Up @@ -79,6 +80,7 @@

=== Documentation ===
* Add API documentation to common use cases pages
* Added new use case: Linthurst

=== Python module ===

Expand Down
3 changes: 3 additions & 0 deletions python/doc/bibliography.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ Bibliography
uncertainty propagation and sensitivity analysis.*, PhD thesis.
Blaise Pascal University-Clermont II, France, 2009.
`pdf <https://tel.archives-ouvertes.fr/tel-00440197/document>`__
.. [blatman2011] Blatman, G., and Sudret, B..
*Adaptive sparse polynomial chaos expansion based on least angle regression.*
Journal of Computational Physics 230 (2011) 2345–2367.
.. [burnham2002] Burnham, K.P., and Anderson, D.R. *Model Selection and
Multimodel Inference: A Practical Information Theoretic Approach*, Springer,
2002.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,61 +14,31 @@
from openturns.viewer import View
import numpy as np
import matplotlib.pyplot as plt
from openturns.usecases import linthurst

# %%
# We define the data.
#

# %%
ds = linthurst.Linthurst()
dimension = ds.data.getDimension() - 1

input_sample = ds.data[:, 1: dimension + 1]
print("Input :")
print(input_sample[:5])
output_sample = ds.data[:, 0]
print("Output :")
print(output_sample[:5])

input_description = input_sample.getDescription()
output_description = output_sample.getDescription()

n = input_sample.getSize()
print("n = ", n)
dimension = ds.data.getDimension() - 1
print("dimension = ", dimension)

description = ["BIO", "SAL", "pH", "K", "Na", "Zn"]
data = [
[676, 33, 5, 1441.67, 35185.5, 16.4524],
[516, 35, 4.75, 1299.19, 28170.4, 13.9852],
[1052, 32, 4.2, 1154.27, 26455, 15.3276],
[868, 30, 4.4, 1045.15, 25072.9, 17.3128],
[1008, 33, 5.55, 521.62, 31664.2, 22.3312],
[436, 33, 5.05, 1273.02, 25491.7, 12.2778],
[544, 36, 4.25, 1346.35, 20877.3, 17.8225],
[680, 30, 4.45, 1253.88, 25621.3, 14.3516],
[640, 38, 4.75, 1242.65, 27587.3, 13.6826],
[492, 30, 4.6, 1281.95, 26511.7, 11.7566],
[984, 30, 4.1, 553.69, 7886.5, 9.882],
[1400, 37, 3.45, 494.74, 14596, 16.6752],
[1276, 33, 3.45, 525.97, 9826.8, 12.373],
[1736, 36, 4.1, 571.14, 11978.4, 9.4058],
[1004, 30, 3.5, 408.64, 10368.6, 14.9302],
[396, 30, 3.25, 646.65, 17307.4, 31.2865],
[352, 27, 3.35, 514.03, 12822, 30.1652],
[328, 29, 3.2, 350.73, 8582.6, 28.5901],
[392, 34, 3.35, 496.29, 12369.5, 19.8795],
[236, 36, 3.3, 580.92, 14731.9, 18.5056],
[392, 30, 3.25, 535.82, 15060.6, 22.1344],
[268, 28, 3.25, 490.34, 11056.3, 28.6101],
[252, 31, 3.2, 552.39, 8118.9, 23.1908],
[236, 31, 3.2, 661.32, 13009.5, 24.6917],
[340, 35, 3.35, 672.15, 15003.7, 22.6758],
[2436, 29, 7.1, 528.65, 10225, 0.3729],
[2216, 35, 7.35, 563.13, 8024.2, 0.2703],
[2096, 35, 7.45, 497.96, 10393, 0.3205],
[1660, 30, 7.45, 458.38, 8711.6, 0.2648],
[2272, 30, 7.4, 498.25, 10239.6, 0.2105],
[824, 26, 4.85, 936.26, 20436, 18.9875],
[1196, 29, 4.6, 894.79, 12519.9, 20.9687],
[1960, 25, 5.2, 941.36, 18979, 23.9841],
[2080, 26, 4.75, 1038.79, 22986.1, 19.9727],
[1764, 26, 5.2, 898.05, 11704.5, 21.3864],
[412, 25, 4.55, 989.87, 17721, 23.7063],
[416, 26, 3.95, 951.28, 16485.2, 30.5589],
[504, 26, 3.7, 939.83, 17101.3, 26.8415],
[492, 27, 3.75, 925.42, 17849, 27.7292],
[636, 27, 4.15, 954.11, 16949.6, 21.5699],
[1756, 24, 5.6, 720.72, 11344.6, 19.6531],
[1232, 27, 5.35, 782.09, 14752.4, 20.3295],
[1400, 26, 5.5, 773.3, 13649.8, 19.588],
[1620, 28, 5.5, 829.26, 14533, 20.1328],
[1560, 28, 5.4, 856.96, 16892.2, 19.242],
]
input_description = ["SAL", "pH", "K", "Na", "Zn"]
output_description = ["BIO"]
sample = ot.Sample(np.array(data))
dimension = sample.getDimension() - 1
n = sample.getSize()

# %%
# Complete linear model
Expand All @@ -80,11 +50,9 @@
# We start by creating a linear model which takes into account all of the physicochemical variables present within the Linthrust data set.
#
# Let us consider the following linear model :math:`\tilde{Y} = a_0 + \sum_{i = 1}^{d} a_i X_i + \epsilon`. If all of the predictive variables
# are considered, the regression can be performed with the help of the `LinearModelAlgorithm` class.
# are considered, the regression can be performed with the help of the :class:`~openturns.LinearModelAlgorithm` class.

# %%
input_sample = sample[:, 1: dimension + 1]
output_sample = sample[:, 0]
algo_full = ot.LinearModelAlgorithm(input_sample, output_sample)
algo_full.run()
result_full = ot.LinearModelResult(algo_full.getResult())
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
"""
Chaos cross-validation
----------------------
Polynomial chaos expansion cross-validation
===========================================
"""

# %%
# Introduction
# ------------
#
# In this example, we show how to perform the cross-validation of the
# :ref:`Ishigami model<use-case-ishigami>` using polynomial chaos expansion.
# More precisely, we use the methods suggested in [muller2016]_, chapter 5, page 257.
# We make the assumption that a dataset is given and we create a metamodel using this data.
# Once done, we want to assess the quality of the metamodel using the data we have.
# Another example of this method is presented in
# :doc:`/auto_numerical_methods/general_methods/plot_pce_design`.
#
# We compare several methods:
# In this example, we compare two methods:
#
# - split the data into two subsets, one for training and one for testing,
# - use k-fold validation.
Expand All @@ -24,7 +29,6 @@
# It uses the K-Fold method with :math:`k = 5`.
# The code uses the :class:`~openturns.KFoldSplitter` class, which computes the appropriate indices.
# Similar results can be obtained with :class:`~openturns.LeaveOneOutSplitter` at a higher cost.
#
# This cross-validation method is used here to see which polynomial degree
# leads to an accurate metamodel of the Ishigami test function.

Expand All @@ -33,8 +37,17 @@
import openturns.viewer as otv
from openturns.usecases import ishigami_function

# %%
# Define helper functions
# -----------------------

# %%
# The next function creates a polynomial chaos expansion from
# a training data set and a total degree.

# %%


def compute_sparse_least_squares_chaos(
inputTrain, outputTrain, basis, totalDegree, distribution
):
Expand Down Expand Up @@ -70,20 +83,22 @@ def compute_sparse_least_squares_chaos(
projectionStrategy = ot.LeastSquaresStrategy(
inputTrain, outputTrain, selectionAlgorithm
)
enumfunc = basis.getEnumerateFunction()
P = enumfunc.getStrataCumulatedCardinal(totalDegree)
adaptiveStrategy = ot.FixedStrategy(basis, P)
chaosalgo = ot.FunctionalChaosAlgorithm(
enumerateFunction = basis.getEnumerateFunction()
basisSize = enumerateFunction.getBasisSizeFromTotalDegree(totalDegree)
adaptiveStrategy = ot.FixedStrategy(basis, basisSize)
chaosAlgo = ot.FunctionalChaosAlgorithm(
inputTrain, outputTrain, distribution, adaptiveStrategy, projectionStrategy
)
chaosalgo.run()
result = chaosalgo.getResult()
chaosAlgo.run()
result = chaosAlgo.getResult()
return result


# %%
# The next function computes the Q2 score by splitting the data set
# into a training set and a test set.


# %%
def compute_Q2_score_by_splitting(
X, Y, basis, totalDegree, distribution, split_fraction=0.75
):
Expand Down Expand Up @@ -119,8 +134,9 @@ def compute_Q2_score_by_splitting(


# %%
# The next function computes the Q2 score by K-Fold.


# %%
def compute_Q2_score_by_kfold(X, Y, basis, totalDegree, distribution, n_folds=5):
"""
Compute score by KFold.
Expand Down Expand Up @@ -155,6 +171,10 @@ def compute_Q2_score_by_kfold(X, Y, basis, totalDegree, distribution, n_folds=5)
return Q2_score


# %%
# Define the training data set
# ----------------------------

# %%
# We start by generating the input and output sample. We use a sample size equal to 1000.

Expand Down Expand Up @@ -195,6 +215,10 @@ def compute_Q2_score_by_kfold(X, Y, basis, totalDegree, distribution, n_folds=5)
# %%
metamodel = result.getMetaModel()

# %%
# Validate the metamodel from a test set
# --------------------------------------

# %%
# In order to validate our polynomial chaos, we generate an extra pair of
# input / output samples and use the :class:`~openturns.MetaModelValidation` class.
Expand Down Expand Up @@ -228,6 +252,10 @@ def compute_Q2_score_by_kfold(X, Y, basis, totalDegree, distribution, n_folds=5)
# The K-Fold validation aims at solving some of these issues, so that all the
# available data is used in order to estimate the :math:`Q^2` score.

# %%
# Compute the Q2 score from a test set
# ------------------------------------

# %%
# In the following script, we compute the :math:`Q^2` score associated with each polynomial degree from 1 to 10.
degree_max = 10
Expand All @@ -251,6 +279,10 @@ def compute_Q2_score_by_kfold(X, Y, basis, totalDegree, distribution, n_folds=5)
# %%
# We see that the polynomial degree may be increased up to degree 7,
# after which the :math:`Q^2` score does not increase much.

# %%
# Compute the Q2 score from K-Fold cross-validation
# -------------------------------------------------
#
# One limitation of the previous method is that the estimate of the :math:`Q^2` may be sensitive to the particular split of the dataset.
# The following script uses 5-Fold cross validation to estimate the :math:`Q^2` score.
Expand All @@ -272,6 +304,10 @@ def compute_Q2_score_by_kfold(X, Y, basis, totalDegree, distribution, n_folds=5)

# %%
# The conclusion is similar to the previous method.

# %%
# Conclusion
# ----------
#
# When we select the best polynomial degree which maximizes the :math:`Q^2` score,
# the danger is that the validation set is used both for computing the :math:`Q^2` and to maximize it:
Expand Down
Loading

0 comments on commit 7179737

Please sign in to comment.