Skip to content

Increase precision or recall

SuShiAtGit edited this page Aug 15, 2019 · 2 revisions

The input parameters that have an effect on the resulting taxonomic profile are:

  • minimum alignment length (-l). Minimum length of the alignment between the read and the marker genes (different from read length). Default value is 75, higher values will produce less false positives (less reads pass the filter) while lower values will recruit more reads, allowing to detect low abundant bugs at the cost of more false positives. Note that this parameter has to be tuned with the average read length, we suggest to choose a value between 45 and 100.
  • type of read counts (-y). There are three possible values: base.coverage, insert.raw_counts, insert.scaled_counts (default). The values with insert.* counts the number of inserts (reads) that map to the gene, where raw_counts measure the absolute number of reads and scaled_counts weights the read counts with the gene length. base.coverage measure the average base coverage of the gene.
  • marker genes cutoff (-g). Every mOTU is composed of 10 marker genes and the read count of the mOTU is calculated as the median of the read counts of the genes that are different from zero. The parameter -g defines the minimum number of genes that have to be different from zero. The default value is 3 and possible values are between 1 and 10. With -g 1 the detection of one gene is enough to consider the mOTU as present in the sample (detecting low abundance species but also also false positives). On the other hand, with -g 6 only the mOTUs with 6 detected genes are counted, reducing the false positives.

A result with higher sensitivity is obtained with, for example, -g 1 -l 30, allowing to detect low abundance bugs (at the cost of detecting more false positives).

A result with higher precision is obtained with, for example, -g 6 -l 100, reducing the number of false positives (at the cost of missing true positives).