Skip to content

Commit

Permalink
Merge pull request #330 from NCI-CGR/default
Browse files Browse the repository at this point in the history
bugfix: Convert missing ancestry to Other category for grafpop output (Issue 326)
  • Loading branch information
carynwillis authored Sep 26, 2024
2 parents 03770cf + 24c0fb2 commit eec673b
Show file tree
Hide file tree
Showing 27 changed files with 1,387 additions and 247 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ This lets us take advantage of snakemake_'s amazing workflow management system,
:maxdepth: 1

sub_workflows/entry_points
sub_workflows/intensity_check
sub_workflows/contamination
sub_workflows/sample_qc
sub_workflows/subject_qc
Expand Down
80 changes: 80 additions & 0 deletions docs/static/bcf_contamination.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
44 changes: 44 additions & 0 deletions docs/static/bcf_intensity.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 80 additions & 0 deletions docs/static/gtc_contamination.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
56 changes: 56 additions & 0 deletions docs/static/idat_intensity.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 16 additions & 15 deletions docs/sub_workflows/contamination.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,28 @@ Contamination Sub-workflow
**Workflow File**:
https://github.com/NCI-CGR/GwasQcPipeline/blob/default/src/cgr_gwas_qc/workflow/sub_workflows/contamination.smk

**Major Outputs**:

- ``sample_level/<BPM Prefix>.<software_params.contam_population>.abf.txt`` B allele frequencies from the 1000 genomes.
- ``sample_level/contamination/verifyIDintensity.csv`` aggregated table of contamination scores.

**Config Options**: see :ref:`config-yaml` for more details

- ``reference_files.thousand_genome_vcf``
- ``reference_files.thousand_genome_tbi``
- ``user_files.gtc_pattern``
- ``user_files.idat_pattern``
- ``user_files.bcf`` or ( ``reference_files.illumina_manifest_file`` and ``user_files.gtc_pattern`` )
- ``software_params.contam_population``

**Major Outputs**:

- ``sample_level/<BPM Prefix>.<software_params.contam_population>.abf.txt`` B allele frequencies from the 1000 genomes.
- ``sample_level/contamination/median_idat_intensity.csv`` aggregated table of median IDAT intensities.
- ``sample_level/contamination/verifyIDintensity.csv`` aggregated table of contamination scores.
|bcf_input_contamination| |gtc_input_contamination|

.. |gtc_input_contamination| image:: ../static/gtc_contamination.svg
:width: 45%

.. figure:: ../static/contamination.png
:name: fig-contamination-workflow
.. |bcf_input_contamination| image:: ../static/bcf_contamination.svg
:width: 45%

The contamination sub-workflow.
This workflow will estimate contamination using verifyIDintensity on each sample individually.
It requires that you have GTC/IDAT files.
It first pulls B-allele frequencies from the 1000 Genomes VCF file.
It then estimate contamination for each sample and aggregates these results.
Finally, it also estimates the per sample median IDAT intensity, which is used to filter contamination results in the :ref:`sample-qc`
The contamination sub-workflow.
This workflow will estimate contamination using verifyIDintensity on each sample individually.
It requires that you have aggregated BCF or GTC files.
It first pulls B-allele frequencies from the 1000 Genomes VCF file.
It then estimates contamination for each sample and aggregates these results.
4 changes: 3 additions & 1 deletion docs/sub_workflows/entry_points.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,15 @@ Entry Points Sub-workflow
- ``user_files.bed``
- ``user_files.bim``
- ``user_files.fam``
- ``user_files.bcf``

**Major Outputs**:

- ``sample_level/samples.bed``
- ``sample_level/samples.bim``
- ``sample_level/samples.fam``

There are three paths we can take to create these files:
There are four paths we can take to create these files:

1. If GTC files are provided using ``user_files.gtc_pattern`` then we will

Expand All @@ -34,3 +35,4 @@ There are three paths we can take to create these files:

2. If an aggregated PED/MAP is provided using ``user_files.ped`` and ``user_files.map`` then we will convert the PED/MAP to BED/BIM/FAM.
3. If an aggregated BED/BIM/FAM is provided using ``user_files.bed``, ``user_files.bim``, ``user_files.fam`` then we will create a symbolic link.
4. If an aggregated BCF file is provided using ``user_files.bcf`` then we will convert the BCF to BED/BIM/FAM.
Loading

0 comments on commit eec673b

Please sign in to comment.