Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebase 326 #330

Merged
merged 39 commits into from
Sep 26, 2024
Merged

rebase 326 #330

merged 39 commits into from
Sep 26, 2024

Conversation

carynwillis
Copy link
Collaborator

@carynwillis carynwillis commented Sep 26, 2024

Rebases issue_326 to default

jaamarks and others added 30 commits September 25, 2024 15:15
Avoids dependency of bpm to name allele B frequencies (abf) file.
to separate median idat intensity retrieval from verifyIDintensity bundled into contamination.smk
Modifis the entry_points.smk to create BCF entry point by simply converting BCF to plink BED. Testing and validation yet to be done.
A previous commit puts them in a separate idat_intensity.smk
Modifies contamination.smk and grouped_contamination.py to enable contamination check in cluster mode.
Avoids the dependency on IDAT files for calculating median intensity with VCF/BCF input.
Adds scripts for both in per-sample and grouped/cluster mode.
Modifies the intensity workflow to execute appropriately if VCF/BCF input.
…kefile and sample_qc subworkflow

Other than existing 'use_contamination' checks, also adds 'intensity_retreived' and 'contamination_checked'
tests which simply tests specifically if output csv files were created regardless of configs/entry point to feed
them to sample_qc.
Removes idat_intensity.smk and keeps intensity_check.smk
Renames from vcf_file to bcf_file explictly indicate that bcf is input.
snakemake params were imported through named import. Changed to import all params through a loop in unnamed fashion. Allows seemless compatibility when gtc or bcf entry point is used.
The starting few lines were duplicated in the entry_points in copy/paste. This removes the duplicated lines.
…r contamination checks.

Previously GC_SCORE was added to the adpc.bin which had depenency that a cluster egt file had to be used in preparation of vcf/bcf. IGC score is encoded in gtc so doesn't depended on cluster egt file. The conamination scores should also be more similar with the gtc input.
Consistent with vcf2adpc.py. Now both should IGC and work with vcf/bcf prepared with gtc2vcf workflow.
Earliar gentrain score was used to mark AF as NA if score is negative. This change excludes using it to ensure compatibility with gtc2vcf workflow prepared bcf.
@carynwillis carynwillis self-assigned this Sep 26, 2024
@carynwillis carynwillis merged commit eec673b into issue_326 Sep 26, 2024
2 checks passed
@carynwillis carynwillis changed the title bugfix: Convert missing ancestry to Other category for grafpop output (Issue 326) rebase 326 Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants