Skip to content

Commit

Permalink
Update Snakemake 8 and Gather/Scatter Indel Calling (#13)
Browse files Browse the repository at this point in the history
* added pysam

* current changes to changelog

* implemented scatter and gather for first round htc

* removed optional quantification - should be required

* caught error when unknown bases occur in wildtype

* remove unwanted print

* added routine to select mhc class type

* splitting in mutect2 analysis for speedup

* rename rule name

* combine single-end and paired-end reads to prepare input for mhc-II genotyping

* added instructions for Snakemake 8

* updated minimum version of Snakemake to 8.x.x

* gather scatter for indel calling

* added instructions to Snakemake 8 and apptainer replaces singularity

* added routine to ease the use of custom variants

* refactor hlatyping to combine read retrieval for MHC-I and MHC-II

* outsource rules for custom variants to improve readability

* added reference sets for hla alleles (to compare against)

* added separate rules for MHC-II prediction tools download

* accept wildcard <group> as parameter to improve usability

* Remove for check for valid alleles - this is now done later to include also user-provided ones

* change to singe file input

* add routine for MHC-I and MHC-II into same script

* add safety routine is no counts can be found (when no seqdata present)

* added custom rules

* added parameters for alignment to config

* changed order when adding INFO tags

* added sorting routine

* safety routines added

* outsource merging of predicted mhccII alleles

* added few parameters

* added to feature list

* changed path to provided hlahd path

* hlhd call as non-file parameter

* added changes to path also to testconfig
  • Loading branch information
riasc authored Mar 1, 2024
1 parent 2928171 commit 7145dc4
Show file tree
Hide file tree
Showing 30 changed files with 4,927 additions and 456 deletions.
4 changes: 2 additions & 2 deletions .tests/integration/config_basic/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ data:
hlatyping:
MHC-I:
MHC-II:
readgroups:

### pre-processing (only applied on fastq reads)
preproc:
Expand Down Expand Up @@ -80,14 +79,15 @@ indel:
sliprate: 0.1 # frequency of slippage when it is supsected

quantification:
activate: true
mode: BOTH # RNA, RNA or BOTH

hlatyping:
class: I # I, II or BOTH
# specific path for class II hlatyping (only required when class: II, or BOTH)
MHC-I_mode: DNA, RNA # DNA, RNA, or BOTH (if empty alleles have to be specified in custom)
MHC-II_mode: BOTH # DNA, RNA, or BOTH (if empty alleles have to be specified in custom)

hlahd_path: ./hlahd.1.7.0/
freqdata: ./hlahd_files/freq_data/
split: ./hlahd_files/HLA_gene.split.txt
dict: ./hlahd_files/dictionary/
Expand Down
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,30 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.2.0] - 2024-02-25

### Features

- ScanNeo2 supports Snakmake>=8
- --use-conda replaced by --software-deployment-method conda
- --use-singularity replaced by --software-deployment-method apptainer
- Gather/scatter of the indel calling speeds up ScanNeo2 on multiple cores
- added script to split bamfiles by chromosome (scripts/split_bam_by_chr.py)
- haplotypecaller first/final round is done per chromosome and later merged
- mutect2 is done per chromosome and later merged
- Genotyping MHC-II works now on both single-end and paired-end
- User-defined HLA alleles are matched against the hla refset
- Added multiple routine to catch errors when only custom variants are provided
- Added additional parameters in config file

### Fix

- When using BAMfiles the HLA typing wrongly expected single-end reads and performed preprocessing
- Each environment is no thoroughly versioned to ensure interoperability
- Missing immunogenicity calculation on certain values of MHC-I fixed
- Fixed prediction of binding affinity in MHC-II (as the columns are different from MHC-I)


## [0.1.6] - 2024-02-13

### Fix
Expand Down
29 changes: 21 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<div align="left">
<h1>ScanNeo2</h1>
<img src="https://img.shields.io/badge/snakemake-≥6.4.1-brightgreen.svg">
<img src="https://img.shields.io/badge/snakemake-≥8.0.0-brightgreen.svg">
<img src="https://github.com/ylab-hi/ScanNeo2/actions/workflows/linting.yml/badge.svg" alt="Workflow status badge">
</div>

Expand Down Expand Up @@ -29,9 +29,10 @@ To get started with ScanNeo2, follow the steps below:
mamba activate scanneo2
```

Note: This installs Snakemake v7.32.x. In its current form, ScanNeo2 is not comptabile with Snakemake >= 8.x.x.
If ScanNeo2 is configured to use the exitron module, singularity needs to be installed. For that, the
`environment_singularity.yml' can be used. However, most HPC servers provide their own module installation.
Note: ScanNeo2 requires Snakemake >= 8.x.x is not compatible with Snakemake <= 8.x.x. If ScanNeo2
is configured to use the exitron module, apptainer (formerly singularity) needs to be installed.
For that, the `environment_apptainer.yml` can be used. However, most HPC servers provide their own
module installation (which should be preferred)

2. Deploy ScanNeo2:

Expand Down Expand Up @@ -66,13 +67,13 @@ To run the workflow, use the following command:
```bash
cd /path/to/your/working/directory/
snakemake --cores all --use-conda
snakemake --cores all --software-deployment-method conda
```

As mentioned above, when exitron detection is activated the singularity option `--use-singularity` has to be used as well.
As mentioned above, when exitron detection is activated the singularity option `--software-deployment-method apptainer` has to be used as well.

```bash
snakemake --cores all --use-conda --use-singularity
snakemake --cores all --software-deployment-method conda apptainer
```

In addition, custom configfiles can be configured using `--configfile <path/to/configfile>`. In principle, this merely
Expand Down Expand Up @@ -101,7 +102,19 @@ ScanNeo2 provides an accessible, efficient method for predicting neoantigens. It

## Citation

If ScanNeo2 has proven useful in your work please cite it using the linked publication.
@article{Schafer2023Nov,
author = {Sch{\ifmmode\ddot{a}\else\"{a}\fi}fer, Richard A. and Guo, Qingxiang and Yang, Rendong},
title = {{ScanNeo2: a comprehensive workflow for neoantigen detection and immunogenicity prediction from diverse genomic and transcriptomic alterations}},
journal = {Bioinformatics},
volume = {39},
number = {11},
pages = {btad659},
year = {2023},
month = nov,
issn = {1367-4811},
publisher = {Oxford Academic},
doi = {10.1093/bioinformatics/btad659}
}

## License

Expand Down
15 changes: 8 additions & 7 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ data:
hlatyping:
MHC-I:
MHC-II:
readgroups:

### pre-processing (only applied on fastq reads)
preproc:
Expand All @@ -29,11 +28,11 @@ preproc:

### alingment
align:
minovlps: 10
chimsegmin: 20
chimoverhang: 10
chimmax: 50
chimmaxdrop: 30
chimSegmentMin: 20
chimScoreMin: 10
chimJunctionOverhangMin: 10
chimScoreDropMax: 30
chimScoreSeparation: 10

### variant calling
# alternative splicing
Expand Down Expand Up @@ -77,7 +76,6 @@ indel:
sliprate: 0.1 # frequency of slippage when it is supsected

quantification:
activate: true
mode: BOTH # RNA, RNA or BOTH

hlatyping:
Expand All @@ -86,6 +84,9 @@ hlatyping:
# specific path for class II hlatyping (only required when class: II, or BOTH)
MHC-I_mode: BOTH # DNA, RNA, or BOTH (if empty alleles have to be specified in custom)
MHC-II_mode: BOTH # DNA, RNA, or BOTH (if empty alleles have to be specified in custom)

# specific path for class II hlatyping (only required when class: II, or BOTH)
hlahd_path: ./hlahd.1.7.0/
freqdata: ./hlahd_files/freq_data/
split: ./hlahd_files/HLA_gene.split.txt
dict: ./hlahd_files/dictionary/
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ channels:
- conda-forge
- anaconda
dependencies:
- snakemake=7.32.3
- snakemake=8.4.11
- snakemake-wrapper-utils
4 changes: 3 additions & 1 deletion environment_singularity.yml → environment_apptainer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,7 @@ channels:
- conda-forge
- anaconda
dependencies:
- snakemake=7.32.3
- snakemake=8.4.11
- snakemake-wrapper-utils
- apptainer

3 changes: 2 additions & 1 deletion workflow/Snakefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from snakemake.utils import min_version

##### set minimum snakemake version #####
min_version("6.4.1")
min_version("8.0.0")

#### setup #######
configfile: "config/config.yaml"
Expand All @@ -23,6 +23,7 @@ include: "rules/genefusion.smk"
include: "rules/altsplicing.smk"
include: "rules/exitron.smk"
include: "rules/indel.smk"
include: "rules/custom.smk"
include: "rules/germline.smk"
include: "rules/annotation.smk"
include: "rules/prioritization.smk"
1 change: 1 addition & 0 deletions workflow/envs/basic.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ dependencies:
- pyfaidx
- biopython=1.78
- gffutils
- pysam
28 changes: 17 additions & 11 deletions workflow/rules/align.smk
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
### align reads to genome using STAR (when reads are in FASTQ)
if config['data']['rnaseq_filetype'] == '.fastq' or config['data']['rnaseq_filetype'] == '.fq':
rule star_fq_paired_end:
rule star_align_fastq:
input:
unpack(get_star_input),
faidx = "resources/refs/genome.fasta.fai",
Expand All @@ -19,22 +19,23 @@ if config['data']['rnaseq_filetype'] == '.fastq' or config['data']['rnaseq_filet
--outSAMattributes RG HI \
--outSAMattrRGline ID:{wildcards.group} \
--outFilterMultimapNmax 50 \
--peOverlapNbasesMin 20 \
--peOverlapNbasesMin 15 \
--alignSplicedMateMapLminOverLmate 0.5 \
--alignSJstitchMismatchNmax 5 -1 5 5 \
--chimOutType WithinBAM HardClip \
--chimSegmentMin 20 \
--chimJunctionOverhangMin 10 \
--chimScoreDropMax 30 \
--chimSegmentMin {config["align"]["chimSegmentMin"]} \
--chimJunctionOverhangMin {config["align"]["chimJunctionOverhangMin"]} \
--chimScoreDropMax {config["align"]["chimScoreDropMax"]} \
--chimScoreMin {config["align"]["chimScoreMin"]} \
--chimScoreJunctionNonGTAG 0 \
--chimScoreSeparation 1 \
--chimScoreSeparation {config["align"]["chimScoreSeparation"]} \
--chimSegmentReadGapMax 3 \
--chimMultimapNmax 50 \
--outSAMstrandField intronMotif"""
threads: config['threads']
wrapper:
"v2.2.1/bio/star/align"

### align reads to genome using STAR (when reads are in BAM - no preprocessing performed)
if config['data']['rnaseq_filetype'] == '.bam':
checkpoint split_bamfile_RG:
Expand Down Expand Up @@ -88,12 +89,17 @@ if config['data']['rnaseq_filetype'] == '.bam':
extra=lambda wildcards: f"""--outSAMtype BAM Unsorted --genomeSAindexNbases 10 \
--readFilesCommand zcat \
--outSAMattributes RG HI --outSAMattrRGline ID:{wildcards.rg} \
--outFilterMultimapNmax 50 --peOverlapNbasesMin 20 \
--outFilterMultimapNmax 50 \
--peOverlapNbasesMin 15 \
--alignSplicedMateMapLminOverLmate 0.5 \
--alignSJstitchMismatchNmax 5 -1 5 5 \
--chimOutType WithinBAM HardClip --chimSegmentMin 20 \
--chimJunctionOverhangMin 10 --chimScoreDropMax 30 \
--chimScoreJunctionNonGTAG 0 --chimScoreSeparation 1 \
--chimOutType WithinBAM HardClip \
--chimSegmentMin {config["align"]["chimSegmentMin"]} \
--chimJunctionOverhangMin {config["align"]["chimJunctionOverhangMin"]} \
--chimScoreDropMax {config["align"]["chimScoreDropMax"]} \
--chimScoreMin {config["align"]["chimScoreMin"]} \
--chimScoreJunctionNonGTAG 0 \
--chimScoreSeparation {config["align"]["chimScoreSeparation"]} \
--chimSegmentReadGapMax 3 --chimMultimapNmax 50 \
--outSAMstrandField intronMotif"""
threads: config['threads']
Expand Down
Loading

0 comments on commit 7145dc4

Please sign in to comment.