Home

piawka is a simple command-line tool to calculate within- and between-population nucleotide diversity statistics using discrete variant calls from the VCF files. Largely inspired by pixy, piawka builds upon it in a few aspects:

The name of the tool is pronounced pi: jaf ka: after a Russian word meaning "leech".

supports arbitrary ploidy level in the VCF files
can use multiallelic SNPs, in biallelic mode also uses multiallelic SNPs that have two alleles in the analyzed groups
handles missing data in a sensible way

Other features are

small weight and portablility, runs wherever vanilla AWK can run (Windows, macOS, Linux...) and requires no installation
fast (and can be parallelized with GNU parallel -- see Usage)
includes additional useful stuff : Fst, Tajima's D-like statistic (with missing data correction), Ronfort's rho (useful for inter-ploidy divergence comparisons)

piawka is written in the (very underappreciated) language of awk. It is fast and user-friendly at processing text tables, ensuring freedom of many dependencies and compilation issues. Recent advancements of GNU awk implementation, gawk, allow going beyond one-liner programs due to modularity and interface for writing C extensions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally