This is the repository for the project of the course Bayesian Statistics held by Professor A. Guglielmi at Politecnico di Milano during the academic year 2020/2021. The tutor of the project was R. Corradin, PhD.
We were given 26 multivariate functional observation with 4 components, each of which represents a somatosensory evoked potential, i.e. the electrical modifications occurring in the central nervous system following a stimulus, detected within 2 seconds after the stimulus over 1600 instants.
The aim of the project is to implement a functional clustering algorithm in a Bayesian nonparametric framework in a univariate functional setting. In particular, we assume an infinite mixture model for our functional obervations with a Dirichlet Process as mixture distribution. Our aim is to sample from Dirichlet process in order to get a sample of the latent partition induced. A Gibbs sampler is implemented for our purposes.
The algorithm and the model will be first tested on simulated data and then on clinical ones.
A short explanation of the model is made in this file, for further details and for a deep mathematical explanation please refer to the report.
The core of the Gibbs Sampler is implemented in the function FBNP_hyper.R
.
Our observed functions are assume to be realizations of random functions that follow a mixture of Gaussian Processes, with a Dirichlet Process as mixture distribution. We are interested in the latent random parameters defyining the Gaussian Process of each observation and we will exploit ties to cluster them. We perform a dimensionality reduction assuming stationarity and independence between different time points. On the other hand, the mean operator is expanded over a basis that we assume exactly spans the space of our data.
Our sampling algorithm is a Blocked Gibbs sampler presented in
Ishwaran, H., and James, L.F. (2001). “Gibbs sampling methods for stick-breaking priors”. Journal of the American Statistical Association, 96.453, pp. 161–173.
It is noteworthy that since we consider an hyperprior structure, we will refer to an slightly modified version of the Blocked Gibbs sampler algorithm, taking into account the full conditional for the hyperparameters.
The repository is presents the following elements:
main.R
: R-script with all the steps for running the Blocked Gibbs samplerTools
: folder containing all the script needed inmain.R
, among whomFBNP.R
containing a function that implements Gibbs sampler without hyperpriors andFBNP_hyper.R
containing a function that implements Gibbs Sampler with hyperpriorsPosterior inference
: folder contanining the scriptmain_posterior inference.R
for performing posterior inference for the latent partition and convergence diagnostic and all the script neeeded in the main oneGaussian process
: folder containing the scriptsmain_GP_indep.R
andmain_GP_exp.R
that perform the Blocked Gibbs sampler on the simulated data with diagonal and exponential covariance operator and all the needed scriptAlltime
: folder containingmain_alltime.R
that performs the Blocked Gibbs sampler changing the prior structure on the covariance operator, setting a different prior on different time instantsAjroldi-Bortolotti-Marchionni-Functional-BNP-Clustering.pdf
: report of the project
The file install.R
provides automatic installation of the required packages.
We simulate 10 trajectories of 3 different Gaussian Processes with different sinusoidal mean operator and a covariance operator coherent with our model (for further details please refer to report). After having conveniently tuned the parameters, our algorithm is able to properly cluster the simulated data with the hyperprior structure, as we can appreciate in the following picture.
To reproduce these results just run the scripts main_GP_exp.R
or main_GP_indep.R
.
Since we work in a univariate setting, we select one of the four components of our functional observations. In particular, we run our algorithm on the right lobe short latency signal and then for the other components. For posterior inference on the latent partition please refer to the report.
Unfortunately, data are not publicly avaiable and thus were not reported in this repository.
Computation were performed using
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
For all the R-packages used in the codes please refer to section Reference in the report.
Niccolò Ajroldi - Politecnico di Milano
Teresa Bortolotti - Politecnico di Milano
Edoardo Marchionni - Politecnico di Milano
(Master of Science in Mathematical engineering students at Politecnico di Milano)