Skip to content
Giorgio Alfredo Spedicato edited this page Mar 24, 2015 · 11 revisions

Improving markovchain R package

Summary: Improving the markovchain package fastening existing functions and adding new functionality to perform probabilistic analysis, statistical estimation and inference.

Description: the markovchain package contains classes and methods to easily handle Discrete Time Markov Chain (DTMC) processes. In addition it provides basic functionality for probabilistic analysis (e.g. classification of states) and some statistical inference (estimating the parameters by Maximum Likelihood). The package is experiencing some fortune since it is quite easy to be used. It is needed both internal code optimization (keeping the end - user interface the same) as well as enhancements in the statistical and probabilistic functions provided (more fitting methods, improved algorithms to classify states).

Related work: The package's vignette provide a very good introduction. Other packages that it is worth to inspect are clickstream and depmixS4. Suggestions on functionalities that could be added can be found in the references cited in the vignettes for example: Discrete-Time Markov Models, Bremaud (1999); Probability Book, Chapter 11, Snell; Markov Chains, J.R. Norris, 1998.

Potential tasks:

Skills required: deep knowledge of probability, stochastic processes, R and Cpp programming. In more details:

  • Knowledge of underlying DTMC proprieties and algorithms to perform states classification,
  • Experience with R and Rcpp, since one goal of this project is to fasten the package using Rcpp
  • Knowledge of R packages creation.
  • at least a basic git knowledge (e.g. branching) and experience with GitHub.

Test: Fork the package on GH and create a pull-request implementing an Rcpp version of markovchainFit function and underlying hidden support functions, within the current end - user interface (it is possible to add parameter to the function or slots to the output objects not to remove any of them). If the candidate is able to implement the estimation of confidence interval for MLE fits it would be a greatly appreciated plus.

Mentor: Giorgio Alfredo Spedicato ([@](mailto:spedygiorgio{at}gmail {dot}com) and [Christophe Dutang] ([@] mailto:dutangc{at}gmail{dot}com) as academic mentor.

Disclaimer: A reduced version of existing vignette is going to be submitted to R Journal. The student will be proposed to contribute as a cohautor in a forthcoming submission to JSS regarding DTMC estimation if he will.