Skip to content

Commit

Permalink
Update docs for 1.6
Browse files Browse the repository at this point in the history
  • Loading branch information
skoren committed May 25, 2015
1 parent c89d035 commit 8759d98
Show file tree
Hide file tree
Showing 4 changed files with 20 additions and 20 deletions.
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ sequence overlapping algorithm. Designed to efficiently detect all overlaps
between noisy long-read sequence data. It efficiently estimates Jaccard similarity
by compressing sequences to their representative fingerprints composed on min-mers (minimum k-mer).

MHAP is included within the Celera Assembler `PBcR <http://wgs-assembler.sourceforge.net/wiki/index.php?title=PBcR>`_ pipeline. The Celera Assembler can be downloaded `here <https://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.2/>`_.
MHAP is included within the Celera Assembler `PBcR <http://wgs-assembler.sourceforge.net/wiki/index.php?title=PBcR>`_ pipeline. The Celera Assembler can be downloaded `here <https://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.3/>`_.

Contents:

Expand Down
26 changes: 13 additions & 13 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ Installation

Before your start
=================
MHAP requires a recent version of the `JVM <http://www.oracle.com/technetwork/java/javase/downloads/jre7-downloads-1880261.html>`_ (1.7u51+). JDK 1.6 or earlier will not work. If you would like to build the code from source, you need to have the `JDK <http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html>`_ and the `ANT <http://ant.apache.org/>`_ build system available.
MHAP requires a recent version of the `JVM <http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html>`_ (1.8+). JDK 1.7 or earlier will not work. If you would like to build the code from source, you need to have the `JDK <http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html>`_ and the `ANT <http://ant.apache.org/>`_ build system available.

Prerequisites
==============
* java (1.7u51+)
* java (1.8+)
* ant (1.8.2+)

Here is a list of currently supported Operating Systems:
Expand All @@ -26,19 +26,19 @@ The pre-compiled version is recommended to users who want to run MHAP, without d

.. code-block:: bash
$ wget https://github.com/marbl/MHAP/releases/download/v1.0/mhap-1.0.tar.gz
$ wget https://github.com/marbl/MHAP/releases/download/1.6/mhap-1.6.tar.gz
And if ``wget`` not available, you can use ``curl`` instead:

.. code-block:: bash
$ curl -L https://github.com/marbl/MHAP/releases/download/v1.0/mhap-1.0.tar.gz > mhap-1.0.tar.gz
$ curl -L https://github.com/marbl/MHAP/releases/download/1.6/mhap-1.6.tar.gz > mhap-1.6.tar.gz
Then run

.. code-block:: bash
$ tar xvzf mhap-1.0.tar.gz
$ tar xvzf mhap-1.6.tar.gz
Source
-----------------
Expand All @@ -47,7 +47,7 @@ To build the code from the release:

.. code-block:: bash
$ wget https://github.com/marbl/MHAP/archive/v1.0.zip
$ wget https://github.com/marbl/MHAP/archive/1.6.zip
If you see a certificate not trusted error, you can add the following option to wget:

Expand All @@ -59,27 +59,27 @@ And if ``wget`` not available, you can use ``curl`` instead:

.. code-block:: bash
$ curl -L https://github.com/marbl/MHAP/archive/v1.0.zip > v1.0.zip
$ curl -L https://github.com/marbl/MHAP/archive/1.6.zip > 1.6.zip
You can also browse the https://github.com/marbl/MHAP/tree/v1.0
You can also browse the https://github.com/marbl/MHAP/tree/1.6
and click on Downloads.

Once downloaded, extract to unpack:

.. code-block:: bash
$ unzip v1.0.zip
$ unzip 1.6.zip
Change to MetAMOS directory:
Change to MHAP directory:

.. code-block:: bash
$ cd MHAP-1.0
$ cd MHAP-1.6
Once inside the MetAMOS directory, run:
Once inside the MHAP directory, run:

.. code-block:: bash
$ ant
This will compile the program and create a target/mhap-1.0.jar file which you can use to run MHAP. The quick-start instructions assume you are in the target directory when running the program. You can also use the target/mhap-0.1.tar file to copy MHAP to a different system or directory.
This will compile the program and create a target/mhap-1.6.jar file which you can use to run MHAP. The quick-start instructions assume you are in the target directory when running the program. You can also use the target/mhap-1.6.tar file to copy MHAP to a different system or directory.
6 changes: 3 additions & 3 deletions docs/source/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Running MHAP provides command-line documenation if you run it without parameters

.. code-block:: bash
$ java -jar mhap-1.0.jar
$ java -jar mhap-1.6.jar
MHAP has two main usage modes, the main finds all overlaps between the input sequences. The second only constructs an index which can be subsequently reused.

Expand All @@ -18,7 +18,7 @@ Finding overlaps

.. code-block:: bash
$ java -Xmx32g -server -jar mhap-1.0.jar -s<fasta/dat from/self file> [-q<fasta/dat to file or directory>] [-f<kmer filter list, must be sorted>]
$ java -Xmx32g -server -jar mhap-1.6.jar -s<fasta/dat from/self file> [-q<fasta/dat to file or directory>] [-f<kmer filter list, must be sorted>]
Both the -s and -q options can accept either FastA sequences or binary dat files (generated as described below). The -q option can accept either a file or a directory, in which case all FastA/dat files in the specified directory will be used. By default, only the sequences specified by -s are indexed and the sequences in -q are streamed against the constructed index. Since MHAP is written in Java, the memory usage can be high. Generally, 32GB of RAM is sufficient to index 20K sequences. If you have more sequences, you can partition your data and run MHAP on the partitions. You can also increase the memory MHAP is allowed to use by changing the Xmx parameter to a larger limit.

Expand All @@ -36,7 +36,7 @@ Constructing binary index

.. code-block:: bash
$ java -Xmx32g -server -jar mhap-1.0.jar -p<directory of fasta files> -q <output directory> [-f<kmer filter list, must be sorted>]
$ java -Xmx32g -server -jar mhap-1.6.jar -p<directory of fasta files> -q <output directory> [-f<kmer filter list, must be sorted>]
In this use case, files in the -p directory will be converted to binary dat files in the -q directory. Subsequent runs using the dat files (instead of FastA files) will be faster as the sequences no longer need to be indexed, only loaded into memory.

Expand Down
6 changes: 3 additions & 3 deletions docs/source/utilities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Assuming you have a mapping of sequences to a truth (such as a reference genome)

.. code-block:: bash
$ java -cp mhap-1.0.jar edu.umd.marbl.mhap.main.EstimateROC <reference mapping M4> <overlaps M4/MHAP> <fasta of sequences> [minimum overlap length to evaluate] [number of random trials] [use dynamic programming] [verbose]
$ java -cp mhap-1.6.jar edu.umd.marbl.mhap.main.EstimateROC <reference mapping M4> <overlaps M4/MHAP> <fasta of sequences> [minimum overlap length to evaluate] [number of random trials] [use dynamic programming] [verbose]
The default minimum overlap length is 2000 and default number of trials is 10000. This will estimate sensitivity/specificity to within 1%. It can be increased at the expense of runtime. Specifying 0 will examine all possible N^2 overlap pairs. If the dynamic programming is turned on (by typing true for the parameter), overlaps not present in the reference mapping will be confirmed if a Smith-Watermann alignment can identify the overlap specified.

Expand All @@ -25,12 +25,12 @@ MHAP includes a tool to simulate sequencing data with random error as well as es

.. code-block:: bash
$ java -cp mhap-1.0.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# sequences> <sequence length (bp)> <insertion error rate> <deletion error rate> <substitution error rate> [reference genome]
$ java -cp mhap-1.6.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# sequences> <sequence length (bp)> <insertion error rate> <deletion error rate> <substitution error rate> [reference genome]
The error rates must be between 0 and 1 and are additive. Specifying 10% insertion, 2% deletion, and 1% substitution will result in sequences with a 13% error rate. If no reference sequence is given, completely random sequences are generated and errors added. Otherwise, random sequences are drawn from the reference and errors added. Errors are added randomly with no bias.

.. code-block:: bash
$ java -cp mhap-1.0.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# trials> <kmer size> <sequence length> <overlap length> <insertion error rate> <deletion error rate> <substitution error rate> [one-sided error] [reference genome] [kmer filter]
$ java -cp mhap-1.6.jar edu.umd.marbl.mhap.main.KmerStatSimulator <# trials> <kmer size> <sequence length> <overlap length> <insertion error rate> <deletion error rate> <substitution error rate> [one-sided error] [reference genome] [kmer filter]
This usage will output a distribution of Jaccard similarity between a pair of overlapping sequences with the specified error rate (when using the specified k-mer size) and two random sequences of the same length. If no reference sequence is given, completely random sequences are generated and errors added, otherwise sequences are drawn from the reference. When one-sided error is specified (by typing true for the parameter), only one of the two sequences will have error simulated, matching a mapping of a noisy sequence to a reference. If a set of k-mers for filtering is given, they are excluded when computing Jaccard similarity, both between random and overlapping sequences.

0 comments on commit 8759d98

Please sign in to comment.