This repository contains the implementation of the SNP-Slice algorithm, which is a Bayesian nonparametric method to resolve multi-strain infections. You can find the motivation for this problem, a description of the algorithm, as well as our results in the Bioarxiv preprint titled SNP-Slice Resolves Mixed Infections: Simultaneously Unveiling Strain Haplotypes and Linking Them to Hosts (https://www.biorxiv.org/content/10.1101/2023.07.29.551098v2).
The structure of the directory contains:
snpslicemain.R
(the main execution file).inputdata/
(a directory to store input data files, named prefix_read1.txt, prefix_read0.txt and prefix_cat.txt.output/
(a directory to store output data (A,D))mcmcRData/
(a directory to store RData files for warm start)source/
(a directory containing the actual implementation of the algorithm).
-
First of all, specify a prefix in
snpslicemain.R
.For example, setting
prefix <- "scenario1"
on line 21 ofsnpslicemain.R
, the script will readscenario1_read1.txt
andscenario1_read0.txt
from theinputata
directory. -
Now you can run the algorithm in the command line, with, for example,
Rscript snpslicemain.R model=3 nmcmc=10000 alpha=2 gap=100
.
- You can also decide which model to use, by controlling the value of
model
. We recommend settingmodel
in the command line instead of in the execution file. The default value is Negative Binomial model. This is the codebook:
model <- 0
for the cat modelmodel <- 1
for the Poisson modelmodel <- 2
for the Binomial modelmodel <- 3
for the Negative Binomial model.