SNPs are used as markers for various population genetic purposes. In particular non-biallelic SNPs are of interest for identification purposes; to have the same discriminative power, more biallelic than non-biallelic SNPs are required.
In this project, we present a tool that uses the NCBI public database dbSNP to identify non-biallelic SNPs.
This work was published in FSI Genetics in 2009.
Install the expat development files:
apt-get install libexpat1-dev
Retrieve the source code and compile the program:
git clone https://github.com/jfjlaros/snp.git
cd snp/src
make
The program requires a dump of the database in XML format. These files are
typically found in the subfolder named genotype
of any of the builds hosted
on the download site of the NCBI.
For a file named gt_chrXX.xml.gz
, use the following command to find the SNP
candidates:
zcat gt_chrXX.xml.gz | ./snp <threshold> > output.txt
The treshold
parameter is used to specify the minimum allele frequency (in
percentages). If this option is omitted, the threshold defaults to 0. By
increasing this variable the amount of output can be greatly reduced, setting
it to 1 or higher is recommended.
Some notes on allele frequencies on the X- and Y chromosomes.
Inspired by this research, we looked into the degradation of methylated nucleotides for similar purposes.