Phaseless is designed for genotype imputation and admixture inference using low coverage sequencing data. Firstly, the imputation model is in the spirit of fastPHASE model but with genotype likelihood as input, and likewise STITCH works on raw reads. Next, the admixture inference is modeled on the haplotype cluster information from the fastphase model.
git clone https://github.com/Zilong-Li/phaseless
make -j6
phaseless
owns subcommands. please use phaseless -h
to check it out.
The parallelism of phaseless impute
is designed for impute the whole genome at once, which means it run multiple chunks in parallel with each taken over by a thread. Check out the --chunksize
option.
phaseless impute -g data/bgl.gz -c 10 -n 4 -s 100000
However, one might only be interested in imputing a single chunk for whatever reason. To change the behavior of parallelism and make it running in parallel for single chunk, we can use --single-chunk
option to toggle the behavior.
phaseless impute -g data/bgl.gz -c 10 -n 4 -S
With the binary file outputted by the above impute
command, we can run admixture inference for different k
ancestry.
phaseless admix -b impute.pars.bin -k 3 -n 4
Besides, we can investigate and manipulate the parameters from fastPHASE
model using the binary file outputted by impute
command.
phaseless parse -b impute.pars.bin -c 0 ## single chunk, all samples
phaseless parse -b impute.pars.bin -c -1 -s samples.txt ## all chunks, specifc samples
Now, we can do some interesting plotting.
./misc/plot_haplotype_cluster.R
Without specifying the output prefix -o
, the output filenames of the above commands are as follows:
❯ tree -L 1
.
├── admix.Q
├── admix.log
├── parse.haplike.bin
├── parse.log
├── impute.recomb
├── impute.pi
├── impute.vcf.gz
├── impute.pars.bin
└── impute.log
check out the news file.