Skip to content

Commit

Permalink
Merge branch 'main' into feature/wgs-preprocessing
Browse files Browse the repository at this point in the history
  • Loading branch information
endast committed Jan 30, 2024
2 parents 9b2371c + 21a8e46 commit 2791489
Show file tree
Hide file tree
Showing 4 changed files with 16 additions and 31 deletions.
2 changes: 1 addition & 1 deletion docs/annotations.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ BCFtools as well as HTSlib should be installed on the machine,

will be installed by the pipeline together with the [plugins](https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html) for primateAI and spliceAI. Annotation data for CADD, spliceAI and primateAI should be downloaded. The path to the data may be specified in the corresponding [config file](https://github.com/PMBio/deeprvat/blob/main/pipelines/config/deeprvat_annotation_config.yaml).
Download path:
- [CADD](http://cadd.gs.washington.edu/download): "All possible SNVs of GRCh38/hg38" and "gnomad.genomes.r3.0.indel.tsv.gz" incl. their Tabix Indices
- [CADD](https://cadd.bihealth.org/download): "All possible SNVs of GRCh38/hg38" and "gnomad.genomes.r3.0.indel.tsv.gz" incl. their Tabix Indices
- [SpliceAI](https://basespace.illumina.com/s/otSPW8hnhaZR): "genome_scores_v1.3"/"spliceai_scores.raw.snv.hg38.vcf.gz" and "spliceai_scores.raw.indel.hg38.vcf.gz"
- [PrimateAI](https://basespace.illumina.com/s/yYGFdGih1rXL) PrimateAI supplementary data/"PrimateAI_scores_v0.2_GRCh38_sorted.tsv.bgz"

Expand Down
26 changes: 9 additions & 17 deletions docs/preprocessing.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,17 +50,18 @@ An example file is included in this repo: [example config](https://github.com/PM
# What chromosomes should be processed
included_chromosomes : [21,22]

# The format of the name of the "raw" vcf files
vcf_files_list: vcf_files_list.txt

# Number of threads to use in the preprocessing script, separate from snakemake threads
preprocess_threads: 16

# If you need to run a cmd to load bcf and samtools specify it here, see example
bcftools_load_cmd : # module load bcftools/1.10.2 &&
samtools_load_cmd : # module load samtools/1.9 &&

# Path to where you want to write results and intermediate data
working_dir: workdir
# Path to ukbb data
data_dir: data

# These paths are all relative to the data dir
metadata_dir_name: metadata

# These paths are all relative to the working dir
# Here will the finished preprocessed files end up
Expand All @@ -82,25 +83,16 @@ convert2bed_max_mem: 64G
# Increase the BED entry by the same number base pairs in each direction
region_expand: 3000

# The format of the name of the "raw" vcf files
vcf_files_list: vcf_files_list.txt

# Number of threads to use in the preprocessing script, separate from snakemake threads
preprocess_threads: 16

# You can specify a different zcat cmd for example gzcat here, default zcat
zcat_cmd:

```
The config above would use the following directory structure:
```shell

parent_directory
|-- data
| |-- metadata
| `-- vcf
`-- workdir
-- workdir
|-- norm
| |-- bcf
| |-- sparse
Expand Down
17 changes: 6 additions & 11 deletions pipelines/config/deeprvat_preprocess_config.yaml
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
# What chromosomes should be processed
included_chromosomes : [21,22]

# The format of the name of the "raw" vcf files
vcf_files_list: vcf_files_list.txt

# Number of threads to use in the preprocessing script, separate from snakemake threads
preprocess_threads: 16

# If you need to run a cmd to load bcf and samtools specify it here, see example
bcftools_load_cmd : # module load bcftools/1.10.2 &&
samtools_load_cmd : # module load samtools/1.9 &&

# Path to where you want to write results and intermediate data
working_dir: workdir
# Path to ukbb data
data_dir: data

# These paths are all relative to the data dir
metadata_dir_name: metadata

# These paths are all relative to the working dir
# Here will the finished preprocessed files end up
Expand All @@ -33,11 +34,5 @@ convert2bed_max_mem: 64G
# Increase the BED entry by the same number base pairs in each direction
region_expand: 3000

# The format of the name of the "raw" vcf files
vcf_files_list: vcf_files_list.txt

# Number of threads to use in the preprocessing script, separate from snakemake threads
preprocess_threads: 16

# You can specify a different zcat cmd for example gzcat here, default zcat
zcat_cmd:
2 changes: 0 additions & 2 deletions pipelines/preprocessing/preprocess.snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,7 @@ zcat_cmd = config.get("zcat_cmd") or "zcat"
preprocessing_cmd = "deeprvat_preprocess"

working_dir = Path(config["working_dir"])
data_dir = Path(config["data_dir"])
preprocessed_dir = working_dir / config["preprocessed_dir_name"]
metadata_dir = data_dir / config["metadata_dir_name"]
reference_dir = working_dir / config["reference_dir_name"]

preprocess_threads = config["preprocess_threads"]
Expand Down

0 comments on commit 2791489

Please sign in to comment.