Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BCF entry point with intensity and contamination checks using BCF for data_catalog usage. #314

Merged
merged 36 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
143eb80
makes illumina_manifest_file optional in the config
May 28, 2024
8a28efb
Uses snps_array name instead of bpm to name *.abf.txt
May 29, 2024
36cfdab
Adds a new idat_intensity workflow
Jun 10, 2024
3c8259e
Adds vcf2abf.py and vcf2adpc.py as prep scripts for contamination che…
Jul 15, 2024
a6cb033
Creates a BCF entry point.
Jul 15, 2024
0146c1b
Removes IDAT intensity checks from contamination.smk
Jul 15, 2024
0dd09f1
Adds rules contamination.smk to do contamination checks with bcf input.
Jul 16, 2024
81697d6
Enables contamiantion check with bcf input in cluster mode.
Jul 16, 2024
56dd245
Adds capability to infer median intensity directly from input BCF/VCF.
Jul 22, 2024
0aaaa9b
Adds the distinction of contamination and intensity checks in the sna…
Jul 22, 2024
0dcbaf2
Correcting conditions such with BCF input intensity and contamination…
Aug 16, 2024
df4e9f2
Renames variable vcf_file to bcf_file when bcf file is refered.
Sep 11, 2024
52f4a89
Removes the duplicated subworkflow file for intensity checks.
Sep 12, 2024
f8ba252
Updates the docstring in contamination.smk to indicate when BCF file …
Sep 12, 2024
20c0a47
Renames input variable to bcf_file.
Sep 12, 2024
fa2bcf5
Imports snakemake params all at once in grouped_contamination.py
Sep 12, 2024
967ff88
Removes typo duplicated header in entry_points
Sep 13, 2024
0224aae
Shifts to using IGC score instead of gentrain score in vcf2adpc.py fo…
Sep 13, 2024
53b4e33
Adds a config suffix validator for bcf_file.
Sep 13, 2024
416f266
Removes some debugging lines from vcf2abf.py
Sep 13, 2024
bbe5006
Shifts to using IGC as genotype score for preparing abf using vcf2abf.
Sep 13, 2024
76c8304
Removes commented lines from vcf2adpc.py and vcf2abf.py
Sep 16, 2024
d840118
fixes a black syntax issue in importing snakemake params in grouped_c…
Sep 16, 2024
2c52669
Modifies vcf2abf to not use gentrain score in preparing abf.
Sep 20, 2024
4e993bc
Includes a None case in the BCF config validator to work with unit te…
Sep 23, 2024
605006c
Adds the app mode in median_intensity_from_vcf to work directly as st…
Sep 23, 2024
b69a3fc
Adds the new unit test scripts relevent to BCF entry
Sep 23, 2024
d1fd658
docs: updates docstring to show bcf as a user_file in config
Sep 24, 2024
5ef948d
docs: Updates the entry_point documentation to show BCF as an entry_p…
Sep 24, 2024
949443e
docs: Updates contamination subworkflow documentation to show process…
Sep 24, 2024
f28fc9c
docs: Adds the documentation for new intensity subworkflow and shows …
Sep 24, 2024
d42a035
docs: Updates index to include intensity_check subworkflow.
Sep 24, 2024
3ea4d4c
docs: corrects spellings in documentation
Sep 25, 2024
4bcf9e6
Removes jupyter comment lines from scripts.
Sep 25, 2024
37de67c
docs: updates the docstring for user_files in config section to inclu…
Sep 25, 2024
58ed006
docs: redo updates the user_files docstring to include BCF.
Sep 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ This lets us take advantage of snakemake_'s amazing workflow management system,
:maxdepth: 1

sub_workflows/entry_points
sub_workflows/intensity_check
sub_workflows/contamination
sub_workflows/sample_qc
sub_workflows/subject_qc
Expand Down
80 changes: 80 additions & 0 deletions docs/static/bcf_contamination.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
44 changes: 44 additions & 0 deletions docs/static/bcf_intensity.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 80 additions & 0 deletions docs/static/gtc_contamination.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
56 changes: 56 additions & 0 deletions docs/static/idat_intensity.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 16 additions & 15 deletions docs/sub_workflows/contamination.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,28 @@ Contamination Sub-workflow
**Workflow File**:
https://github.com/NCI-CGR/GwasQcPipeline/blob/default/src/cgr_gwas_qc/workflow/sub_workflows/contamination.smk

**Major Outputs**:

- ``sample_level/<BPM Prefix>.<software_params.contam_population>.abf.txt`` B allele frequencies from the 1000 genomes.
- ``sample_level/contamination/verifyIDintensity.csv`` aggregated table of contamination scores.

**Config Options**: see :ref:`config-yaml` for more details

- ``reference_files.thousand_genome_vcf``
- ``reference_files.thousand_genome_tbi``
- ``user_files.gtc_pattern``
- ``user_files.idat_pattern``
- ``user_files.bcf`` or ( ``reference_files.illumina_manifest_file`` and ``user_files.gtc_pattern`` )
- ``software_params.contam_population``

**Major Outputs**:

- ``sample_level/<BPM Prefix>.<software_params.contam_population>.abf.txt`` B allele frequencies from the 1000 genomes.
- ``sample_level/contamination/median_idat_intensity.csv`` aggregated table of median IDAT intensities.
- ``sample_level/contamination/verifyIDintensity.csv`` aggregated table of contamination scores.
|bcf_input_contamination| |gtc_input_contamination|

.. |gtc_input_contamination| image:: ../static/gtc_contamination.svg
:width: 45%

.. figure:: ../static/contamination.png
:name: fig-contamination-workflow
.. |bcf_input_contamination| image:: ../static/bcf_contamination.svg
:width: 45%

The contamination sub-workflow.
This workflow will estimate contamination using verifyIDintensity on each sample individually.
It requires that you have GTC/IDAT files.
It first pulls B-allele frequencies from the 1000 Genomes VCF file.
It then estimate contamination for each sample and aggregates these results.
Finally, it also estimates the per sample median IDAT intensity, which is used to filter contamination results in the :ref:`sample-qc`
The contamination sub-workflow.
This workflow will estimate contamination using verifyIDintensity on each sample individually.
It requires that you have aggregated BCF or GTC files.
It first pulls B-allele frequencies from the 1000 Genomes VCF file.
It then estimates contamination for each sample and aggregates these results.
4 changes: 3 additions & 1 deletion docs/sub_workflows/entry_points.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,15 @@ Entry Points Sub-workflow
- ``user_files.bed``
- ``user_files.bim``
- ``user_files.fam``
- ``user_files.bcf``

**Major Outputs**:

- ``sample_level/samples.bed``
- ``sample_level/samples.bim``
- ``sample_level/samples.fam``

There are three paths we can take to create these files:
There are four paths we can take to create these files:

1. If GTC files are provided using ``user_files.gtc_pattern`` then we will

Expand All @@ -34,3 +35,4 @@ There are three paths we can take to create these files:

2. If an aggregated PED/MAP is provided using ``user_files.ped`` and ``user_files.map`` then we will convert the PED/MAP to BED/BIM/FAM.
3. If an aggregated BED/BIM/FAM is provided using ``user_files.bed``, ``user_files.bim``, ``user_files.fam`` then we will create a symbolic link.
4. If an aggregated BCF file is provided using ``user_files.bcf`` then we will convert the BCF to BED/BIM/FAM.
Loading
Loading