Created by Ariella Gladstein, based on code from Consuelo Quinto Cortes and Krishna Veeramah.
Also worked on by David Christy, Logan Gantner, and Mack Skodiak.
agladstein@email.arizona.edu
SimPrily runs genome simulations with user defined parameters or parameters randomly generated by priors and computes genomic statistics on the simulation output.
Version 1
- Run genome simulation with model defined by prior distributions of parameters and demographic model structure.
- Take into account SNP array ascertainment bias by creating pseudo array based on priors of number of samples of discovery populations and allele frequency cut-off.
- Calculate genomic summary statistics on simulated genomes and pseudo arrays.
This is ideal for use with Approximate Bayesian Computation on whole genome or SNP array data.
Uses c++ programs macs and GERMLINE. For more information on these programs, see:
https://github.com/gchen98/macs
https://github.com/sgusev/GERMLINE
cd to the directory you want to work in,
git clone https://github.com/agladstein/SimPrily.git
If using Vagrant (this is recommended if running on non-Linux OS):
Start Vagrant, ssh into Vagrant, cd to SimPrily directory:
vagrant up
vagrant ssh
cd /vagrant
Install the virtual environment and install the requirements.
./setup/setup_env_vbox_2.7.sh
If not using Vagrant, just install the virtual environment and install the requirements:
./setup/setup_env_2.7.sh
e.g. One Test simulation:
python simprily.py -p examples/eg1/param_file_eg1_asc.txt -m examples/eg1/model_file_eg1_asc.csv -g genetic_map_b37/genetic_map_GRCh37_chr1.txt.macshs -a array_template/ill_650_test.bed -i 1 -o output_dir -v
For quick help:
python simprily.py --help
simprily.py
takes 4 required arguments and 2 optional arguments, and help, verbose, and profile options.
Run as
python simprily.py [-h] -p PARAM -m MODEL -i ID -o OUT [-g MAP] [-a ARRAY] [-v] [--profile]
-p PARAM
or --param PARAM
= The location of the parameter file
-m MODEL
or --model MODEL
= The location of the model file
-i ID
or --id ID
= The unique identifier of the job
-o OUT
or --out OUT
= The location of the output directory
-h
or --help
= shows a help message and exists
-v
= increase output verbosity. This includes 3 levels, -v
, -vv
, and -vvv
--profile
= Print a log file containing the time in seconds and memory use in Mb for main functions
-g MAP
or --map MAP
= The location of the genetic map file
-a ARRAY
or --array ARRAY
= The location of the array template file, in bed form
Three subdirectories are created in the directory specified in the output_dir
argument.
output_dir/results
output_dir/sim_data
output_dir/germline_out
Intermediate files go to output_dir/sim_data
and output_dir/germline_out
.
output_dir/sim_data
contains PLINK formated .ped and .map files created from the pseudo array, which are necessary to run GERMLINE.
output_dir/germline_out
contains the GERMLINE .match output and .log. The .match contains all of the identified IBD segments.
These files are NOT automatically removed in python script, but are unnecessary once the job is complete.
Output files go to output_dir/results
.
output_dir/results
contains the parameter values used in the simulation and the summary statistics calculated from the simulation.
The first line is a header with the parameter names and summary statistics names.
The second line is the parameter values and summary statistics values.
Must have an Open Science Grid Connect account.
Create an account at https://osgconnect.net/signup
Log onto Open Science Grid Connect
ssh user-name@login01.osgconnect.net
Working directory must be pegasus_workflow
.
Submit a Pegasus workflow (must be in pegasus_workflow
)
./submit -p PARAM -m MODEL -j NUM [-g MAP] [-a ARRAY]
e.g.
./submit -p ../examples/eg2/param_file_eg2_asc.txt -m ../examples/eg2/model_file_eg2_asc.csv -j 10 -a ../array_template/ill_650_test.bed -g ../genetic_map_b37/genetic_map_GRCh37_chr1.txt.macshs
The results will appear in
/local-scratch/user-name/workflows/simprily_id
where user-name
is specific to the user, and id
is the workflow id.
- Main SimPrily GitHub Repository
- CyVerse Discovery Environment SimPrily Workflow
- SimPrily HT File Setup App
- DE SimPrily App
- DE SimPrily Concatenation App
- SimPrily Pegasus Documentation
- If exponential growth is large, macs simulation will not finish. (This is a macs bug).
- If the same id is used with the same output dir as a previous run, the .map file will be appended to.