-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Nikita Tikhomirov edited this page Oct 11, 2024
·
11 revisions
piawka
is a simple command-line tool to calculate within- and between-population nucleotide diversity statistics using discrete variant calls from the VCF files. Largely inspired by pixy
, piawka
builds upon it in a few aspects:
The name of the tool is pronounced pi: jaf ka: after a Russian word meaning "leech".
- supports arbitrary ploidy level in the VCF files
- can use multiallelic SNPs, in biallelic mode also uses multiallelic SNPs that have two alleles in the analyzed groups
- handles missing data in a sensible way
Other features are
- small weight and portablility, runs wherever vanilla AWK can run (Windows, macOS, Linux...) and requires no installation
-
fast (and can be parallelized with GNU
parallel
-- see Usage) - includes additional useful stuff : Fst, Tajima's D-like statistic (with missing data correction), Ronfort's rho (useful for inter-ploidy divergence comparisons)
piawka
is written in the (very underappreciated) language of awk
. It is fast and user-friendly at processing text tables, ensuring freedom of many dependencies and compilation issues. Recent advancements of GNU awk
implementation, gawk
, allow going beyond one-liner programs due to modularity and interface for writing C extensions.