Skip to content
Nikita Tikhomirov edited this page Oct 11, 2024 · 11 revisions

piawka is a simple command-line tool to calculate within- and between-population nucleotide diversity statistics using discrete variant calls from the VCF files. Largely inspired by pixy, piawka builds upon it in a few aspects:

The name of the tool is pronounced pi: jaf ka: after a Russian word meaning "leech".

  • supports arbitrary ploidy level in the VCF files
  • can use multiallelic SNPs, in biallelic mode also uses multiallelic SNPs that have two alleles in the analyzed groups
  • handles missing data in a sensible way

Other features are

  • small weight and portablility, runs wherever vanilla AWK can run (Windows, macOS, Linux...) and requires no installation
  • fast (and can be parallelized with GNU parallel -- see Usage)
  • includes additional useful stuff : Fst, Tajima's D-like statistic (with missing data correction), Ronfort's rho (useful for inter-ploidy divergence comparisons)

piawka is written in the (very underappreciated) language of awk. It is fast and user-friendly at processing text tables, ensuring freedom of many dependencies and compilation issues. Recent advancements of GNU awk implementation, gawk, allow going beyond one-liner programs due to modularity and interface for writing C extensions.

Clone this wiki locally