diff --git a/docs/source/index.rst b/docs/source/index.rst index d288223..efc07ad 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -10,7 +10,7 @@ sequence overlapping algorithm. Designed to efficiently detect all overlaps between noisy long-read sequence data. It efficiently estimates Jaccard similarity by compressing sequences to their representative fingerprints composed on min-mers (minimum k-mer). -MHAP is included within the Celera Assembler `PBcR `_ pipeline. The Celera Assembler can be downloaded `here `_. +MHAP is included within the Celera Assembler `PBcR `_ pipeline. The Celera Assembler can be downloaded `here `_. Contents: diff --git a/docs/source/installation.rst b/docs/source/installation.rst index 099bb1b..a93c238 100644 --- a/docs/source/installation.rst +++ b/docs/source/installation.rst @@ -4,11 +4,11 @@ Installation Before your start ================= -MHAP requires a recent version of the `JVM `_ (1.7u51+). JDK 1.6 or earlier will not work. If you would like to build the code from source, you need to have the `JDK `_ and the `ANT `_ build system available. +MHAP requires a recent version of the `JVM `_ (1.8+). JDK 1.7 or earlier will not work. If you would like to build the code from source, you need to have the `JDK `_ and the `ANT `_ build system available. Prerequisites ============== - * java (1.7u51+) + * java (1.8+) * ant (1.8.2+) Here is a list of currently supported Operating Systems: @@ -26,19 +26,19 @@ The pre-compiled version is recommended to users who want to run MHAP, without d .. code-block:: bash - $ wget https://github.com/marbl/MHAP/releases/download/v1.0/mhap-1.0.tar.gz + $ wget https://github.com/marbl/MHAP/releases/download/1.6/mhap-1.6.tar.gz And if ``wget`` not available, you can use ``curl`` instead: .. code-block:: bash - $ curl -L https://github.com/marbl/MHAP/releases/download/v1.0/mhap-1.0.tar.gz > mhap-1.0.tar.gz + $ curl -L https://github.com/marbl/MHAP/releases/download/1.6/mhap-1.6.tar.gz > mhap-1.6.tar.gz Then run .. code-block:: bash - $ tar xvzf mhap-1.0.tar.gz + $ tar xvzf mhap-1.6.tar.gz Source ----------------- @@ -47,7 +47,7 @@ To build the code from the release: .. code-block:: bash - $ wget https://github.com/marbl/MHAP/archive/v1.0.zip + $ wget https://github.com/marbl/MHAP/archive/1.6.zip If you see a certificate not trusted error, you can add the following option to wget: @@ -59,27 +59,27 @@ And if ``wget`` not available, you can use ``curl`` instead: .. code-block:: bash - $ curl -L https://github.com/marbl/MHAP/archive/v1.0.zip > v1.0.zip + $ curl -L https://github.com/marbl/MHAP/archive/1.6.zip > 1.6.zip -You can also browse the https://github.com/marbl/MHAP/tree/v1.0 +You can also browse the https://github.com/marbl/MHAP/tree/1.6 and click on Downloads. Once downloaded, extract to unpack: .. code-block:: bash - $ unzip v1.0.zip + $ unzip 1.6.zip -Change to MetAMOS directory: +Change to MHAP directory: .. code-block:: bash - $ cd MHAP-1.0 + $ cd MHAP-1.6 -Once inside the MetAMOS directory, run: +Once inside the MHAP directory, run: .. code-block:: bash $ ant -This will compile the program and create a target/mhap-1.0.jar file which you can use to run MHAP. The quick-start instructions assume you are in the target directory when running the program. You can also use the target/mhap-0.1.tar file to copy MHAP to a different system or directory. +This will compile the program and create a target/mhap-1.6.jar file which you can use to run MHAP. The quick-start instructions assume you are in the target directory when running the program. You can also use the target/mhap-1.6.tar file to copy MHAP to a different system or directory. diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst index 8a62755..018724c 100644 --- a/docs/source/quickstart.rst +++ b/docs/source/quickstart.rst @@ -9,7 +9,7 @@ Running MHAP provides command-line documenation if you run it without parameters .. code-block:: bash - $ java -jar mhap-1.0.jar + $ java -jar mhap-1.6.jar MHAP has two main usage modes, the main finds all overlaps between the input sequences. The second only constructs an index which can be subsequently reused. @@ -18,7 +18,7 @@ Finding overlaps .. code-block:: bash - $ java -Xmx32g -server -jar mhap-1.0.jar -s [-q] [-f] + $ java -Xmx32g -server -jar mhap-1.6.jar -s [-q] [-f] Both the -s and -q options can accept either FastA sequences or binary dat files (generated as described below). The -q option can accept either a file or a directory, in which case all FastA/dat files in the specified directory will be used. By default, only the sequences specified by -s are indexed and the sequences in -q are streamed against the constructed index. Since MHAP is written in Java, the memory usage can be high. Generally, 32GB of RAM is sufficient to index 20K sequences. If you have more sequences, you can partition your data and run MHAP on the partitions. You can also increase the memory MHAP is allowed to use by changing the Xmx parameter to a larger limit. @@ -36,7 +36,7 @@ Constructing binary index .. code-block:: bash - $ java -Xmx32g -server -jar mhap-1.0.jar -p -q [-f] + $ java -Xmx32g -server -jar mhap-1.6.jar -p -q [-f] In this use case, files in the -p directory will be converted to binary dat files in the -q directory. Subsequent runs using the dat files (instead of FastA files) will be faster as the sequences no longer need to be indexed, only loaded into memory. diff --git a/docs/source/utilities.rst b/docs/source/utilities.rst index 1e10eb8..d58ed1e 100644 --- a/docs/source/utilities.rst +++ b/docs/source/utilities.rst @@ -14,7 +14,7 @@ Assuming you have a mapping of sequences to a truth (such as a reference genome) .. code-block:: bash - $ java -cp mhap-1.0.jar edu.umd.marbl.mhap.main.EstimateROC [minimum overlap length to evaluate] [number of random trials] [use dynamic programming] [verbose] + $ java -cp mhap-1.6.jar edu.umd.marbl.mhap.main.EstimateROC [minimum overlap length to evaluate] [number of random trials] [use dynamic programming] [verbose] The default minimum overlap length is 2000 and default number of trials is 10000. This will estimate sensitivity/specificity to within 1%. It can be increased at the expense of runtime. Specifying 0 will examine all possible N^2 overlap pairs. If the dynamic programming is turned on (by typing true for the parameter), overlaps not present in the reference mapping will be confirmed if a Smith-Watermann alignment can identify the overlap specified. @@ -25,12 +25,12 @@ MHAP includes a tool to simulate sequencing data with random error as well as es .. code-block:: bash - $ java -cp mhap-1.0.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# sequences> [reference genome] + $ java -cp mhap-1.6.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# sequences> [reference genome] The error rates must be between 0 and 1 and are additive. Specifying 10% insertion, 2% deletion, and 1% substitution will result in sequences with a 13% error rate. If no reference sequence is given, completely random sequences are generated and errors added. Otherwise, random sequences are drawn from the reference and errors added. Errors are added randomly with no bias. .. code-block:: bash - $ java -cp mhap-1.0.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# trials> [one-sided error] [reference genome] [kmer filter] + $ java -cp mhap-1.6.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# trials> [one-sided error] [reference genome] [kmer filter] This usage will output a distribution of Jaccard similarity between a pair of overlapping sequences with the specified error rate (when using the specified k-mer size) and two random sequences of the same length. If no reference sequence is given, completely random sequences are generated and errors added, otherwise sequences are drawn from the reference. When one-sided error is specified (by typing true for the parameter), only one of the two sequences will have error simulated, matching a mapping of a noisy sequence to a reference. If a set of k-mers for filtering is given, they are excluded when computing Jaccard similarity, both between random and overlapping sequences.