Skip to content

Commit

Permalink
Merge branch 'dev' for 1.2.0 release.
Browse files Browse the repository at this point in the history
  • Loading branch information
lemieuxl committed Jun 16, 2015
2 parents 5436fde + dd1955e commit cb14915
Show file tree
Hide file tree
Showing 24 changed files with 1,307 additions and 148 deletions.
13 changes: 8 additions & 5 deletions README.mkd
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

# genipe - A Python module to perform genome-wide imputation analysis

*Version 1.1.0*
*Version 1.2.0*

The `genipe` module (standing for **GEN**ome-wide **I**mputation
**P**ipelin**E**) includes a script (named `genipe-launcher`) that
Expand Down Expand Up @@ -101,11 +101,11 @@ usage: genipe-launcher [-h] [-v] [--debug] [--thread THREAD] --bfile PREFIX
--legend-template TEMPLATE --map-template TEMPLATE
--sample-file FILE [--filtering-rules RULE [RULE ...]]
[--probability FLOAT] [--completion FLOAT]
[--report-number NB] [--report-title TITLE]
[--report-author AUTHOR]
[--info FLOAT] [--report-number NB]
[--report-title TITLE] [--report-author AUTHOR]

Execute the genome-wide imputation pipeline. This script is part of the
'genipe' package, version 1.1.0.
'genipe' package, version 1.2.0.

optional arguments:
-h, --help show this help message and exit
Expand Down Expand Up @@ -164,6 +164,9 @@ IMPUTE2 Merger Options:
--probability FLOAT The probability threshold for no calls. [<0.9]
--completion FLOAT The completion rate threshold for site exclusion.
[<0.98]
--info FLOAT The measure of the observed statistical information
associated with the allele frequency estimate
threshold for site exclusion. [<0.00]

Automatic Report Options:
--report-number NB The report number. [genipe automatic report]
Expand Down Expand Up @@ -209,7 +212,7 @@ usage: imputed-stats [-h] [-v] {cox,linear,logistic,mixedlm,skat} ...

Performs statistical analysis on imputed data (either SKAT analysis, or
linear, logistic or survival regression). This script is part of the 'genipe'
package, version 1.1.0).
package, version 1.2.0).

optional arguments:
-h, --help show this help message and exit
Expand Down
Binary file modified docs/_static/tutorial/report.pdf
Binary file not shown.
24 changes: 17 additions & 7 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,17 @@ The :py:mod:`genipe` (GENome-wide Imputation PipelinE) module provides an easy
an efficient way of performing genome-wide imputation analysis using the three
commonly used softwares `PLINK <http://pngu.mgh.harvard.edu/~purcell/plink/>`_,
`SHAPEIT <https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html>`_ and
`IMPUTE2 <https://mathgen.stats.ox.ac.uk/impute/impute_v2.html>`_. It also
provides a useful standalone tool to perform statistical analysis on imputed
(dosage) data (such as linear, logistic or survival regressions, or
`SKAT <http://www.hsph.harvard.edu/skat/>`_ analysis of rare variants).
`IMPUTE2 <https://mathgen.stats.ox.ac.uk/impute/impute_v2.html>`_.

A quality metrics report is automatically generated at the end of the
imputation process to easily assess the quality of the analysis. The report is
compiled into a PDF. For information on how to compile the report, refer to the
:ref:`genipe-tut-compile-report` section in the main :ref:`genipe-tut-page`.

Finally, it also provides a useful standalone tool to perform statistical
analysis on imputed (dosage) data (such as linear, logistic or survival
regressions, or `SKAT <http://www.hsph.harvard.edu/skat/>`_ analysis of rare
variants).

.. toctree::
:maxdepth: 2
Expand Down Expand Up @@ -46,11 +53,11 @@ Usage
--legend-template TEMPLATE --map-template TEMPLATE
--sample-file FILE [--filtering-rules RULE [RULE ...]]
[--probability FLOAT] [--completion FLOAT]
[--report-number NB] [--report-title TITLE]
[--report-author AUTHOR]
[--info FLOAT] [--report-number NB]
[--report-title TITLE] [--report-author AUTHOR]
Execute the genome-wide imputation pipeline. This script is part of the
'genipe' package, version 1.1.0.
'genipe' package, version 1.2.0.
optional arguments:
-h, --help show this help message and exit
Expand Down Expand Up @@ -109,6 +116,9 @@ Usage
--probability FLOAT The probability threshold for no calls. [<0.9]
--completion FLOAT The completion rate threshold for site exclusion.
[<0.98]
--info FLOAT The measure of the observed statistical information
associated with the allele frequency estimate
threshold for site exclusion. [<0.00]
Automatic Report Options:
--report-number NB The report number. [genipe automatic report]
Expand Down
7 changes: 1 addition & 6 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,17 +154,12 @@ Testing the installation
-------------------------

To test the installation, make sure that the virtual environment is activated.
Then, launch python and use the following commands:
Then, launch Python and use the following python commands:

.. code-block:: python
>>> import genipe
>>> genipe.test()
......................ss.ss.......................ss...ss...s.s.........
----------------------------------------------------------------------
Ran 72 tests in 107.268s
OK (skipped=10)
.. _install-update:
Expand Down
9 changes: 9 additions & 0 deletions docs/module_content/genipe.tests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,15 @@ genipe.tests.test_formats module
:show-inheritance:


genipe.tests.test_impute2_extractor module
-------------------------------------------

.. automodule:: genipe.tests.test_impute2_extractor
:members:
:undoc-members:
:show-inheritance:


genipe.tests.test_impute2_merger module
----------------------------------------

Expand Down
1 change: 1 addition & 0 deletions docs/output_files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ In summary, here is the structure of the output files. Again, refer to the
│ ├── chr1.imputed.completion_rates
│ ├── chr1.imputed.good_sites
│ ├── chr1.imputed.impute2.gz
│ ├── chr1.imputed.impute2_info
│ ├── chr1.imputed.imputed_sites
│ ├── chr1.imputed.log
│ ├── chr1.imputed.maf
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/tutorial_cox.rst
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ in the console:
NAME
Performs a survival regression on imputed data using Cox's proportional hazard
model. This script is part of the 'genipe' package, version 1.1.0).
model. This script is part of the 'genipe' package, version 1.2.0).
optional arguments:
-h, --help show this help message and exit
Expand Down
45 changes: 26 additions & 19 deletions docs/tutorials/tutorial_extract.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@ Site extraction

Genome-wide imputation dataset might be huge. Often, it is required to extract
a subset of imputed sites (*e.g.* specific markers, genomic location, or
markers with a specific minor allele frequency). Also, different format might
be required, depending of the underlying analysis (*e.g.* hard calls or dosage
values). We provide an easy tool to perform site extraction of multiple
*impute2* files using either marker identification number, or genomic location
and/or minor allele frequency and/or call rate.
markers with a specific minor allele frequency, information value or completion
rate). Also, different format might be required, depending of the underlying
analysis (*e.g.* hard calls or dosage values). We provide an easy tool to
perform site extraction of multiple *impute2* files using either marker
identification number, or genomic location and/or minor allele frequency and/or
call rate and/or information value.

We suppose that you have followed the main :ref:`genipe-tut-page`. The
following command will create the working directory for this tutorial.
Expand All @@ -42,7 +43,7 @@ extraction tools are automatically created in the ``final_impute2`` directories

The files that are required in these directories depends of what kind of
extraction is required (by name, or by genomic location and/or by minor allele
frequency and/or by calling rate).
frequency and/or by calling rate and/or by information value).

Once the required *impute2* files are provided to the tool, the other required
files will be automatically fetched (if required).
Expand All @@ -56,12 +57,13 @@ Executing the extraction
The first time the tool is used on a set of *impute2* files, indexation will
automatically occur (to speed of the analysis for future extraction). There are
two ways to extract markers: using their identification number (``--extract``),
or properties (``--genomic``, ``--maf`` or ``rate``).
or using their properties (``--genomic``, ``--maf``, ``--rate`` and/or
``--info``).

.. note::

It is possible to extract from multiple *impute2* files at the same time (by
specifying multiple input files.
specifying multiple input files).


Extraction by ID
Expand All @@ -84,7 +86,7 @@ This ``marker_list.txt`` file will contain the following:
rs76139713:51137523:C:T
rs372879164:17037188:A:G
Then, the following command (using the ``--extract`` option will extract those
Then, the following command (using the ``--extract`` option) will extract those
two markers from the *impute2* file.

.. code-block:: bash
Expand All @@ -103,13 +105,14 @@ two markers from the *impute2* file.
Extraction by characteristics
""""""""""""""""""""""""""""""

There are three ways to extract markers according to their characteristics. The
There are four ways to extract markers according to their characteristics. The
first way is to specify the genomic location of the markers to extract (*i.e.*
the ``--genomic`` option). The second way is to specify a minor allele
frequency threshold (*i.e.* the ``--maf`` option). The third and final way is
to specify a call rate threshold (*i.e.* the ``--rate`` option). Those three
ways can be used at the same time (*e.g.* to get markers in a specific genomic
range and a specific call rate).
frequency threshold (*i.e.* the ``--maf`` option). The third way is to specify
a call rate threshold (*i.e.* the ``--rate`` option). The fourth and final way
is to specify an information value threshold (*i.e.* the ``--info`` option).
Those four ways can be used at the same time (*e.g.* to get markers in a
specific genomic range and a specific call rate).

For example, to extract markers with a MAF :math:`\geq` 0.05 located in the
*CYP2D6* gene, perform the following command:
Expand Down Expand Up @@ -233,10 +236,10 @@ analysis in the console:
[--out PREFIX] [--format FORMAT [FORMAT ...]]
[--prob FLOAT] [--extract FILE]
[--genomic CHR:START-END] [--maf FLOAT]
[--rate FLOAT]
[--rate FLOAT] [--info FLOAT]
Extract imputed markers located in a specific genomic region. This script is
part of the 'genipe' package, version 1.1.0).
part of the 'genipe' package, version 1.2.0).
optional arguments:
-h, --help show this help message and exit
Expand All @@ -262,11 +265,15 @@ analysis in the console:
--extract FILE File containing marker names to extract.
--genomic CHR:START-END
The range to extract (e.g. 22 1000000 1500000). Can be
use in combination with '--rate' and '--maf'.
use in combination with '--rate', '--maf' and '--
info'.
--maf FLOAT Extract markers with a minor allele frequency equal or
higher than the specified threshold. Can be use in
combination with '--rate' and '--genomic'.
combination with '--rate', '--info' and '--genomic'.
--rate FLOAT Extract markers with a completion rate equal or higher
to the specified threshold. Can be use in combination
with '--maf' and '--genomic'.
with '--maf', '--info' and '--genomic'.
--info FLOAT Extract markers with an information equal or higher to
the specified threshold. Can be use in combination
with '--maf', '--rate' and '--genomic'.
38 changes: 35 additions & 3 deletions docs/tutorials/tutorial_genipe.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ Quick navigation
1. :ref:`genipe-tut-softwares`
2. :ref:`genipe-tut-input-files`
3. :ref:`genipe-tut-execute`
4. :ref:`genipe-tut-output-files`
4. :ref:`genipe-tut-compile-report`
5. :ref:`genipe-tut-output-files`

Genome-wide imputation pipeline
--------------------------------
Expand Down Expand Up @@ -410,6 +411,28 @@ previous command (see the :ref:`genipe-usage` section for a full list):
subsequent steps).


.. _genipe-tut-compile-report:

Compiling the report
^^^^^^^^^^^^^^^^^^^^^

A report containing useful information (such as quality metrics and execution
time, among others) is automatically generated once the imputation process is
completed. To compile the report, perform the following commands:

.. code-block:: bash
cd $HOME/genipe_tutorial/genipe/report
make && make clean
This will generate the following
`PDF report <http://pgxcentre.github.io/genipe/_static/tutorial/report.pdf>`_
(which is named ``report.pdf``). It is always possible to modify the original
``report.tex`` file to include analysis specific details (*e.g.* cohort
description).


.. _genipe-tut-output-files:

Output files
Expand Down Expand Up @@ -448,6 +471,7 @@ files.
│ ├── chr1.imputed.completion_rates
│ ├── chr1.imputed.good_sites
│ ├── chr1.imputed.impute2.gz
│ ├── chr1.imputed.impute2_info
│ ├── chr1.imputed.imputed_sites
│ ├── chr1.imputed.log
│ ├── chr1.imputed.maf
Expand Down Expand Up @@ -552,12 +576,20 @@ autosomal chromosomes. They will contain the following files:
| | the user, where the default is higher |
| | and equal to 0.9). |
+-------------------------------+-----------------------------------------+
| ``.imputed.impute2`` | Imputation results (merged from the |
| | individual segment files. This file |
| ``.imputed.impute2`` or | Imputation results (merged from the |
| ``.imputed.impute2.gz`` | individual segment files. This file |
| | might be compress (with the ``.gz`` |
| | extension) if the ``--bgzip`` option was|
| | used when launching the pipeline. |
+-------------------------------+-----------------------------------------+
| ``.imputed.impute2_info`` | Marker-wise information file with one |
| | line per marker and a single header line|
| | at the begening. It contains, among |
| | others, the information value which is a|
| | measure of the observed statistical |
| | information associated with the allele |
| | frequency estimate. |
+-------------------------------+-----------------------------------------+
| ``.imputed.imputed_sites`` | List of imputed sites (excluding sites |
| | that were previously genotyped in the |
| | study cohort). |
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/tutorial_linear.rst
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ analysis in the console:
--pheno-name NAME
Performs a linear regression (ordinary least squares) on imputed data. This
script is part of the 'genipe' package, version 1.1.0).
script is part of the 'genipe' package, version 1.2.0).
optional arguments:
-h, --help show this help message and exit
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/tutorial_logistic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ regression analysis in the console:
--pheno-name NAME
Performs a logistic regression on imputed data using a GLM with a binomial
distribution. This script is part of the 'genipe' package, version 1.1.0).
distribution. This script is part of the 'genipe' package, version 1.2.0).
optional arguments:
-h, --help show this help message and exit
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/tutorial_mixedlm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ effects analysis in the console:
Performs a linear mixed effects regression on imputed data using a random
intercept for each group. This script is part of the 'genipe' package, version
1.1.0).
1.2.0).
optional arguments:
-h, --help show this help message and exit
Expand Down
Loading

0 comments on commit cb14915

Please sign in to comment.