Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
lemieuxl committed Feb 3, 2016
2 parents 64ba7ba + 1e435e0 commit 06ef9d6
Show file tree
Hide file tree
Showing 31 changed files with 1,649 additions and 193 deletions.
4 changes: 2 additions & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
include README.mkd
include configuration_example_1_of_2.conf
include configuration_example_2_of_2.conf
include configuration_example_1_of_2.ini
include configuration_example_2_of_2.ini
include LICENSE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,27 @@

[1]
# ##############################################################################
# Checks sample contamination using the bafRegress tool
# (http://genome.sph.umich.edu/wiki/BAFRegress). Field name can be modify using
# options (as describe below).
# ##############################################################################

script = contamination
raw-dir = /PATH/TO/DIRECTORY/CONTAINING/INTENSITIES.txt
# colsample = Sample Name
# colmarker = SNP Name
# colbaf = B Allele Freq
# colab1 = Allele1 - AB
# colab2 = Allele2 - AB
# sge
# sge-walltime = WRITE WALLTIME ONLY IF REQUIRED
# sge-nodes = WRITE NB NODES AND NB PROCESSOR PER NODE ONLY IF REQUIRED
# sample-per-run-for-sge = 30



[2]
# ##############################################################################
# Checks missing rate and pairwise concordance of duplicated samples. Duplicated
# samples should have same family and individual identification numbers. The
# names can be modified directly in the transposed pedfile.
Expand All @@ -16,7 +37,7 @@ script = duplicated_samples



[2]
[3]
# ##############################################################################
# Checks missing rate and pairwise concordance of duplicated markers. Duplicated
# markers are found by looking at their chromosomal position. No modification of
Expand All @@ -30,7 +51,7 @@ script = duplicated_snps



[3]
[4]
# ##############################################################################
# Finds and removes markers which have a missing rate of 100% or markers (not
# located on mitochondrial chromosome) that have a heterozygosity rate of 0%.
Expand All @@ -40,7 +61,7 @@ script = noCall_hetero_snps



[4]
[5]
# ##############################################################################
# Removes sample with a missing rate higher than a user defined threshold. For
# this step, we recommend using a threshold of 10% missing rate as samples with
Expand All @@ -52,7 +73,7 @@ script = sample_missingness



[5]
[6]
# ##############################################################################
# Removes markers with a missing rate higher than a user defined threshold. For
# this step, we recommend using a threshold of 2% missing rate.
Expand All @@ -63,7 +84,7 @@ script = snp_missingness



[6]
[7]
# ##############################################################################
# Removes sample with a missing rate higher than a user defined threshold. For
# this step, we recommend using a threshold of 2% missing rate.
Expand All @@ -74,7 +95,7 @@ mind = 0.02



[7]
[8]
# ##############################################################################
# Using PLINK, finds samples with gender issues, according to heterozygosity
# rate on the X chromosome. If you want to produce a gender plot, you need to
Expand All @@ -95,10 +116,11 @@ script = sex_check
# lrr-baf
# lrr-baf-raw-dir = /PATH/TO/DIRECTORY/CONTAINING/BAF_LRR_FILES.txt
# lrr-baf-format = png
# lrr-baf-dpi = 300



[8]
[9]
# ##############################################################################
# Using PLINK, performs a plate bias analysis, using a p value threshold of
# 1.0e-7.
Expand All @@ -110,7 +132,7 @@ loop-assoc = /PATH/TO/FILE/CONTAINING/PLATE_INFORMATION.txt



[9]
[10]
# ##############################################################################
# Checks for related individual and randomly keeps one of each related group. If
# you have a server with a DRMAA-compliant distributed resource management
Expand All @@ -130,7 +152,7 @@ script = find_related_samples



[10]
[11]
# ##############################################################################
# Using PLINK, computes the MDS value of each sample, and using three reference
# populations (CEU, YRI and JPT-CHB), finds outliers of one of those three
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,25 @@
# directories using the PLINK's binary file located in
# "remove_heterozygous_haploid".

[11]
[12]
# ##############################################################################
# After manually checking that everything went fine in the previous steps, you
# need to create a list of samples to remove from steps [7] to [10] and a list
# of markers to exclude from steps [6]. Just create a file containing family and
# individual identification numbers for all those samples to remove.
# individual identification numbers for all those samples to remove. Note that
# the two options 'reason-marker' and 'reason-sample' are for the automatic
# report generated after the analysis.
# ##############################################################################

script = subset
reason-marker = reason for marker exclusion
reason-sample = reason for sample exclusion
remove = /PATH/TO/FILE/CONTAINING/ALL_SAMPLES_FROM_PREVIOUS_STEPS_TO_REMOVE.txt
exclude = /PATH/TO/FILE/CONTAINING/ALL_MARKERS_FROM_PREVIOUS_STEPS_TO_EXCLUDE.txt



[12]
[13]
# ##############################################################################
# Removes heterozygous haploid genotypes from the dataset.
# ##############################################################################
Expand All @@ -42,7 +46,7 @@ script = remove_heterozygous_haploid



[13]
[14]
# ##############################################################################
# Flags uninformative markers (with a MAF of 0). This step only flag markers.
# You might want to exclude them later on.
Expand All @@ -52,7 +56,7 @@ script = flag_maf_zero



[14]
[15]
# ##############################################################################
# Flags markers that fail HWE test for a p value of 1e-4 and after Bonferroni
# correction. This step only flag markers. You might want to exclude them later
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,27 @@

[1]
# ##############################################################################
# Checks sample contamination using the bafRegress tool
# (http://genome.sph.umich.edu/wiki/BAFRegress). Field name can be modify using
# options (as describe below).
# ##############################################################################

script = contamination
raw-dir = /PATH/TO/DIRECTORY/CONTAINING/INTENSITIES.txt
# colsample = Sample Name
# colmarker = SNP Name
# colbaf = B Allele Freq
# colab1 = Allele1 - AB
# colab2 = Allele2 - AB
# sge
# sge-walltime = WRITE WALLTIME ONLY IF REQUIRED
# sge-nodes = WRITE NB NODES AND NB PROCESSOR PER NODE ONLY IF REQUIRED
# sample-per-run-for-sge = 30



[2]
# ##############################################################################
# Checks missing rate and pairwise concordance of duplicated samples. Duplicated
# samples should have same family and individual identification numbers. The
# names can be modified directly in the transposed pedfile.
Expand All @@ -16,7 +37,7 @@ script = duplicated_samples



[2]
[3]
# ##############################################################################
# Checks missing rate and pairwise concordance of duplicated markers. Duplicated
# markers are found by looking at their chromosomal position. No modification of
Expand All @@ -30,7 +51,7 @@ script = duplicated_snps



[3]
[4]
# ##############################################################################
# Finds and removes markers which have a missing rate of 100% or markers (not
# located on mitochondrial chromosome) that have a heterozygosity rate of 0%.
Expand All @@ -40,7 +61,7 @@ script = noCall_hetero_snps



[4]
[5]
# ##############################################################################
# Removes sample with a missing rate higher than a user defined threshold. For
# this step, we recommend using a threshold of 10% missing rate as samples with
Expand All @@ -52,7 +73,7 @@ script = sample_missingness



[5]
[6]
# ##############################################################################
# Removes markers with a missing rate higher than a user defined threshold. For
# this step, we recommend using a threshold of 2% missing rate.
Expand All @@ -63,7 +84,7 @@ script = snp_missingness



[6]
[7]
# ##############################################################################
# Removes sample with a missing rate higher than a user defined threshold. For
# this step, we recommend using a threshold of 2% missing rate.
Expand All @@ -74,7 +95,7 @@ mind = 0.02



[7]
[8]
# ##############################################################################
# Using PLINK, finds samples with gender issues, according to heterozygosity
# rate on the X chromosome. If you want to produce a gender plot, you need to
Expand All @@ -95,10 +116,11 @@ script = sex_check
# lrr-baf
# lrr-baf-raw-dir = /PATH/TO/DIRECTORY/CONTAINING/BAF_LRR_FILES.txt
# lrr-baf-format = png
# lrr-baf-dpi = 300



[8]
[9]
# ##############################################################################
# Using PLINK, performs a plate bias analysis, using a p value threshold of
# 1.0e-7.
Expand All @@ -110,7 +132,7 @@ loop-assoc = /PATH/TO/FILE/CONTAINING/PLATE_INFORMATION.txt



[9]
[10]
# ##############################################################################
# Checks for related individual and randomly keeps one of each related group. If
# you have a server with a DRMAA-compliant distributed resource management
Expand All @@ -130,7 +152,7 @@ script = find_related_samples



[10]
[11]
# ##############################################################################
# Using PLINK, computes the MDS value of each sample, and using three reference
# populations (CEU, YRI and JPT-CHB), finds outliers of one of those three
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,25 @@
# directories using the PLINK's binary file located in
# "remove_heterozygous_haploid".

[11]
[12]
# ##############################################################################
# After manually checking that everything went fine in the previous steps, you
# need to create a list of samples to remove from steps [7] to [10] and a list
# of markers to exclude from steps [6]. Just create a file containing family and
# individual identification numbers for all those samples to remove.
# individual identification numbers for all those samples to remove. Note that
# the two options 'reason-marker' and 'reason-sample' are for the automatic
# report generated after the analysis.
# ##############################################################################

script = subset
reason-marker = reason for marker exclusion
reason-sample = reason for sample exclusion
remove = /PATH/TO/FILE/CONTAINING/ALL_SAMPLES_FROM_PREVIOUS_STEPS_TO_REMOVE.txt
exclude = /PATH/TO/FILE/CONTAINING/ALL_MARKERS_FROM_PREVIOUS_STEPS_TO_EXCLUDE.txt



[12]
[13]
# ##############################################################################
# Removes heterozygous haploid genotypes from the dataset.
# ##############################################################################
Expand All @@ -42,7 +46,7 @@ script = remove_heterozygous_haploid



[13]
[14]
# ##############################################################################
# Flags uninformative markers (with a MAF of 0). This step only flag markers.
# You might want to exclude them later on.
Expand All @@ -52,7 +56,7 @@ script = flag_maf_zero



[14]
[15]
# ##############################################################################
# Flags markers that fail HWE test for a p value of 1e-4 and after Bonferroni
# correction. This step only flag markers. You might want to exclude them later
Expand Down
Binary file not shown.
11 changes: 10 additions & 1 deletion docs/automatic_report.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Report merging
==============

On a typical data clean up pipeline, multiple directories will be created (one
for each of the parts of the pipeline. A script is provided to merge all those
for each of the parts of the pipeline). A script is provided to merge all those
reports (one per ``data_clean_up.YYYY-MM-DD_HH.MM.SS`` directory) into a single
report. Here is the usage of this script:

Expand Down Expand Up @@ -76,3 +76,12 @@ To execute the report merging procedure, perform the following command:
background of the automatic report by using the ``--report-title``,
``--report-author``, ``--report-number`` or ``--report-background``
options, respectively.


Once again, to compile the final report, perform the following command:

.. code-block:: console
$ pdflatex pyGenClean_report.tex
$ pdflatex pyGenClean_report.tex
$ pdflatex pyGenClean_report.tex
4 changes: 2 additions & 2 deletions docs/configuration_files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ If you want to generate the gender and BAF and LRR plots, you will require to
provide the intensities (``sex-chr-intensities`` and ``lrr-baf-raw-dir`` in the
``sex_check`` section (``[7]``) after uncommenting the required options).

.. literalinclude:: _static/configuration_files/configuration_example_1_of_2.conf
.. literalinclude:: _static/configuration_files/configuration_example_1_of_2.ini
:linenos:
:language: lighttpd

Expand All @@ -46,7 +46,7 @@ A file containing the samples and markers to be removed should be created using
the output of the ``sex_check``, ``find_related_samples``, ``check_ethnicity``
and ``plate_bias`` sections of the :ref:`first_conf_file`.

.. literalinclude:: _static/configuration_files/configuration_example_2_of_2.conf
.. literalinclude:: _static/configuration_files/configuration_example_2_of_2.ini
:linenos:
:language: lighttpd

Loading

0 comments on commit 06ef9d6

Please sign in to comment.