From c0e333036421615a74279f5d718fc3556d7f88ee Mon Sep 17 00:00:00 2001 From: Konstantin Berlin Date: Tue, 4 Oct 2016 17:42:49 -0400 Subject: [PATCH] Update utilities.rst --- docs/source/utilities.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/utilities.rst b/docs/source/utilities.rst index f041afb..705e067 100644 --- a/docs/source/utilities.rst +++ b/docs/source/utilities.rst @@ -14,7 +14,7 @@ Assuming you have a mapping of sequences to a truth (such as a reference genome) .. code-block:: bash - $ java -cp mhap-2.1.jar edu.umd.marbl.mhap.main.EstimateROC [minimum overlap length to evaluate] [number of random trials] [use dynamic programming] [verbose] [minimum identity of overlap] [maximum different between expected overlap and reported] [load all overlaps] + $ java -cp mhap-2.1.1.jar edu.umd.marbl.mhap.main.EstimateROC [minimum overlap length to evaluate] [number of random trials] [use dynamic programming] [verbose] [minimum identity of overlap] [maximum different between expected overlap and reported] [load all overlaps] The default minimum overlap length is 2000 and default number of trials is 10000. This will estimate sensitivity/specificity to within 1%. It can be increased at the expense of runtime. Specifying 0 will examine all possible N^2 overlap pairs. @@ -41,12 +41,12 @@ MHAP includes a tool to simulate sequencing data with random error as well as es .. code-block:: bash - $ java -cp mhap-2.1.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# sequences> [reference genome] + $ java -cp mhap-2.1.1.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# sequences> [reference genome] The error rates must be between 0 and 1 and are additive. Specifying 10% insertion, 2% deletion, and 1% substitution will result in sequences with a 13% error rate. If no reference sequence is given, completely random sequences are generated and errors added. Otherwise, random sequences are drawn from the reference and errors added. Errors are added randomly with no bias. .. code-block:: bash - $ java -cp mhap-2.1.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# trials> [one-sided error] [reference genome] [kmer filter] + $ java -cp mhap-2.1.1.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# trials> [one-sided error] [reference genome] [kmer filter] This usage will output a distribution of Jaccard similarity between a pair of overlapping sequences with the specified error rate (when using the specified k-mer size) and two random sequences of the same length. If no reference sequence is given, completely random sequences are generated and errors added, otherwise sequences are drawn from the reference. When one-sided error is specified (by typing true for the parameter), only one of the two sequences will have error simulated, matching a mapping of a noisy sequence to a reference. If a set of k-mers for filtering is given, they are excluded when computing Jaccard similarity, both between random and overlapping sequences.