Skip to content
Barry Rowlingson edited this page Mar 10, 2015 · 3 revisions

InFER: Inference For Epidemic-related Risk

Summary: This project will make Bayesian epidemic risk prediction accessible to epidemiologists. It aims to hide the implementational complexities of real-time inference behind an R interface, creating a revolutionary enabling tool for epidemiological research and decision-making during disease outbreaks.

Description: Bayesian inference for epidemics is a recently developed, powerful method for fitting spatial dynamical disease models to observed data in real-time quantitative risk prediction. Although disease simulation packages are available as tools for epidemiologists, there is currently no comprehensive toolkit for formal parameter inference. Moreover, the complexity of Bayesian algorithm implementation presents a barrier to the uptake of these methods by the scientific community.

Chris Jewell has proposed a GPU-accelerated C++ framework for quickly constructing fast MCMC algorithms to make inference on a wide class of epidemic models. The approach makes use of spatial population data and the detection times and identities of individual disease cases. This has been used for the UK foot and mouth disease outbreak in 2001, scaling to a population size of 188000 farms with 2026 detected infections. Using GPU technology allows sequential risk predictions in an overnight timeframe, a speedup of 200-fold over a single CPU. Preliminary code shows that embedding this in an R package is not only possible, but will lead to a streamlined platform for high-performance computing in epidemiology.

Related work:

  • EpiEstim, R0 -- estimation of basic reproduction number only.
  • EpiDynamics and EpiModel -- simulation from epidemic models only
  • Outbreaker -- inference on a limited class of models, CPU implementation only and therefore slow.

Potential tasks:

  • Streamlining of the R package build system to cope with dependencies on CUDA, CUDPP, Boost, HDF5 and GSL.
  • The posterior samples generated by the MCMC are written to an HDF5 file. This provides a flexible way to store samples where the dimension of the posterior changes from one iteration to the next. A rudimentary R interface for reading these files as a data.frame has already been written. However, further development of both the interface and GIS embedding is required.
  • Although the CUDA-accelerated calculations provide a massive speed improvement over the CPU, it only achieves this for large datasets. Moreover, there is a requirement for high-spec GPU cards to be resident in the machine. CPU-based equivalent code is more efficient for small datasets and will work where the GPU is not available. The development of an automatic selection of GPU/CPU algorithms is therefore required.
  • Design of an efficient, dynamic way of specifying an epidemic model using the R interface. Currently the C++ code requires manual changes and recompiling. A S4 class hierarchy is proposed to achieve this, allowing the user to specify a particular epidemic model for which fitting and/or simulation algorithms are automatically dispatched.

Skills required: R package, Autotools, and C++ code development. A working knowledge of inheritance, templates, and CUDA programming is recommended though not essential.

Test:

  • Clone the [InFER project Git repository] (http://github.com/chrism0dwk/infer). Switch to the 'gsoc' branch. Install the R package located at infer/R/infer, referring to the README file where necessary.
  • In C++, write a class hierarchy to calculate the value of the Normal(\mu,\sigma) and Gamma( alpha, \lambda) probability distribution functions (pdfs) for any given value of a random variable. Your code should use a pure virtual base class as an interface to these calculations. You may use any underlying software library to help you (e.g. a C library returning values from these pdfs). Use the unit test code found in the test repository, reading the README carefully, to test your code.
  • Write a Makefile to compile the unit test, committing and pushing to your fork of the test repository.

Mentor: Chris Jewell ([@](mailto:chrism0dwk {at} gmail {dot} com)) and Barry Rowlingson ([@](mailto:b.rowlingson {at} lancaster {dot} ac {dot} uk)).

Disclaimer: If the student agrees, he/she will be proposed as a contributing author in a forthcoming journal article describing the methodology used in this package.