Submits a list of genes with SNPs to PolyPhen2 and orders them by a score measuring the overall deleteriousness of the mutations, normalized for gene product length in amino acids (AAs).
Written by Ted Pak for the Roth Laboratory.
Released under an MIT license; please see MIT-LICENSE.txt.
BeautifulSoup, an included library, is released under a new BSD-style license.
Please see its source code or website for more details.
Any modern *nix with Python >=2.6 installed. This includes Mac OS X.
$ ./pph2_and_rank.py [-h|--help] [--hg19] [--humdiv] [-q|--quiet] [-s|--sid SID]
[-o|--output OUTPUT] MUT_LIST_1 [MUT_LIST_2 ...]
-
-h
or--help
displays a simple usage message. -
--hg19
tells PolyPhen2 to use build 19 of the human genome assembly instead of build 18. -
--humdiv
tells PolyPhen2 to use the HumDiv PolyPhen2 classifier model instead of HumVar. -
-o OUTPUT
or--output OUTPUT
will cause output to be written to the file named by OUTPUT instead of sending it to standard output. -
-q
or--quiet
will suppress the status updates normally sent to standard error. -
-s SID
or--sid SID
allows you to specify a previous GGI session ID that the script will re-attach to. In this case, all theMUT_LIST_$N
inputs are ignored. Use this when a previous script has already submitted information to PolyPhen2 but the script was prematurely terminated. The SID is displayed in the status messages that this script prints to standard error once it has been received from PolyPhen's GGI status page.
MUT_LIST_1
, MUT_LIST_2
and so on must be files where each line follows
the form acc_id mut_1 mut_2 mut_3 ...
and each mut_$n
is the
concatenation of the canonical AA, the position in the AA sequence, and the
mutated AA. For example:
Q92889 I706T E875G
Q6ZSZ5 L763F
Q32M45 Q390P E463Q A549T E570K P686H
If any one of the MUT_LIST_$N
arguments is set to -
(hyphen), the list
is read from standard input until EOF is encountered. Therefore, you can
also do:
$ cat mutations_1.txt mutations_2.txt | ./pph2_and_rank.py -
The output is a tab-delimited table of genes sorted by their score, where the score is produced by summing the PolyPhen2 scores, dividing by the AA sequence length of the gene product, and multiplying by (the completely arbitrary constant) 1000. It is sent to standard output, with progress messages sent to standard error.
To save the output from the script, either redirect standard output:
$ ./pph2_and_rank.py mutations.txt > output.txt
or use the -o
or --output
option:
$ ./pph2_and_rank.py -o output.txt mutations.txt